key: cord-0596462-z3a4dvwy authors: Phelan, Thomas; Toda, Alexis Akira title: Optimal Epidemic Control in Equilibrium with Imperfect Testing and Enforcement date: 2021-04-09 journal: nan DOI: nan sha: b388177bd5a944378f79763aa00fb8691c314446 doc_id: 596462 cord_uid: z3a4dvwy We analyze equilibrium behavior and optimal policy within a Susceptible-Infected-Recovered epidemic model augmented with potentially undiagnosed agents who infer their health status and a social planner with imperfect enforcement of social distancing. We define and prove the existence of a perfect Bayesian Markov competitive equilibrium and contrast it with the efficient allocation subject to the same informational constraints. We identify two externalities, static (individual actions affect current risk of infection) and dynamic (individual actions affect future disease prevalence), and study how they are affected by limitations on testing and enforcement. We prove that a planner with imperfect enforcement will always wish to curtail activity, but that its incentives vanish as testing becomes perfect. When a vaccine arrives far into the future, the planner with perfect enforcement may encourage activity before herd immunity. We find that lockdown policies have modest welfare gains, whereas quarantine policies are effective even with imperfect testing. Soon after the evidence of the first community spread of the Coronavirus Disease 2019 outside China was reported in Italy in late February 2020, European countries promptly introduced drastic mitigation measures ("lockdown") such as closure of schools, restaurants, and other businesses. Many states and provinces in the United States and Canada as well as countries around the world followed suit by mid-March. While implementing policies to slow the spread of an infectious disease appears to be an obvious course of action for a prudent government, such mitigation policies have evidently not been without costs. Curtailing economic activities can cause unemployment, bankruptcy, and reduced access to education. Further, engaging in non-pharmaceutical interventions (better known as "social distancing") reduces new infections but also delays achieving herd immunity, and so may prolong the epidemic in the absence of a vaccine. It is therefore possible that lockdown policies can slow the progress of the epidemic but do little to alter its ultimate toll. Although it appears we are now in the later stages of the pandemic, with several vaccines developed and administered, the immense disruption to economic and social activity wrought by the virus and the possibility of future pandemics (either due to variants or new viruses) motivates a theoretical analysis of an epidemic model suitably augmented with realistic features to capture policy-relevant tradeoffs. In this paper we build upon the standard Kermack and McKendrick (1927) Susceptible-Infected-Recovered (SIR) model and add two important features. First, the agents in our model are forward-looking and endogenously respond to the epidemic, but must continuously update their beliefs about their own health status because they may lack symptoms or testing may not be available. Second, we study the extent to which prescriptions for policy depend upon the ability of the government to enforce their recommended actions over the longterm. The first feature is motivated by the fact that the infection fatality rate (IFR, fraction of deaths among all cases) of COVID-19 inferred from seroprevalence studies is an order of magnitude smaller than the case fatality rate (CFR, fraction of deaths among confirmed cases), which suggests substantial underreporting. 1 The second is motivated by the fact that some governments were both slow to impose social distancing measures and may lack the ability to enforce such measures over the long-run, possibly due to opposition from constituents. To the best of our knowledge, no existing work considers these two features and studies the role they play in shaping optimal policy responses. Our model works as follows. The society consists of four behavioral types of agents: unknown, infected, recovered, and dead. The unknown type consists of agents who lack immunity (are susceptible) against the infectious disease as well 1 According to the meta-analysis of seroprevalence studies by Ioannidis (2021) , the median IFR of COVID-19 is 0.27%. On the other hand, as we document in Section 4, the median CFR across more than 200 countries and regions is 1.35%. Thus the reporting rate is of the order 0.27/1.35 = 20%. as those who are infected or recovered but unconfirmed due to imperfect testing or asymptomatic infections. Each period, alive agents take actions a ∈ [0, 1], which we interpret as their overall level of economic activity. We assume that in the absence of an epidemic, agents prefer taking the highest action a = 1. During an epidemic, taking higher actions exposes oneself to the risk of infection. As a result, rational agents without confirmed immunity (the unknown type) optimally choose lower actions, i.e., they voluntarily practice social distancing. Known infected and recovered agents have no incentive to social distance and choose the highest action available to them. We define a perfect Bayesian Markov competitive equilibrium to be an allocation in which (i) agents form beliefs about their health status and optimize given the state variables (population shares of each type) and (ii) the evolution of beliefs and state variables are consistent with Bayes' rule and the collective behavior of agents. We obtain two main theoretical results. First, we prove the existence of a (pure strategy) perfect Bayesian Markov competitive equilibrium (Markov equilibrium for short). This result is important because to achieve the equilibrium, individuals and policy makers only need to form expectations about the future given a few state variables and do not require implausibly sophisticated coordination among them. The equilibrium allocation is in general inefficient due to the externality caused by the actions of infected agents. Existing analyses of the pandemic have either focused upon allocations in which a planner dictates all activity in perpetuity, 2 or laissez-faire allocations in which no social distancing is imposed. 3 Although obviously informative, neither of these cases models the problem of a planner who was previously slow to act, or who believes their capacity to enforce restrictions may dissipate over time. The recursive nature of our solution methods allows us to address this situation, as we compute equilibrium and efficient activity levels at every point in the state space. In order to highlight the role that imperfect enforcement plays in optimal policy responses, we distinguish between two types of externalities: static and dynamic. The static externality arises because the activity of infected agents affects the probability that susceptible agents become infected in the current period. The dynamic externality arises because the collective behavior of agents affects the evolution of the prevalence of the virus throughout the population. The interplay between the static and dynamic externalities is subtle as they can move in different directions: when an individual chooses higher activity, they increase the risk of infecting their fellow citizens today, but if they become infected they reduce the risk they pose to others in the future. This brings us to the second theoretical result. We prove that the difference between the static efficient (the optimal choice taking future prevalence as given) and equilibrium actions is bounded above by a number proportional to the fraction of unknown infected agents, a quantity that vanishes as the probability of diagnosis converges to 1. This shows that a government who can only enforce short-term lockdown policies will always wish to curb activity, but their incentive to do so vanishes with the fraction of unconfirmed cases. This observation is noteworthy because we also provide examples in which a government with unlimited enforcement power wishes to recommend higher activity than that which occurs in equilibrium. This phenomenon is observed in situations in which the vaccine is not expected to arrive until far into the future, so that herd immunity is the only possible end to the pandemic. In these circumstances a benevolent government may wish to encourage activity until the mass of susceptible agents is sufficiently small, before imposing social distancing to minimize the total number of infections (and hence deaths) necessary to reach herd immunity. It is in this sense that we show how policy prescriptions can depend crucially on enforcement capabilities. The presence of diagnosed and undiagnosed infected agents implies that there are two tools for intervention available to the planner: activity recommendations for unknown agents (referred to as lockdown policies) and recommendations for known infected agents (referred to as quarantine policies). This combination of undiagnosed agents and the possibility of imperfect enforcement capabilities is precisely what makes the problem of the planner difficult. Indeed, if all infected agents could be immediately and costlessly quarantined, then the pandemic would likely not have had the immense economic and social impact that we have observed over the past year. In the majority of this paper, we therefore focus primarily on the recommendations to unknown agents, taking as given a fixed level of activity by the known infected agents. To illustrate our theory as well as to study the optimal interventions, we calibrate our model to the current COVID-19 epidemic and conduct a number of robustness exercises. Due to endogenous social distancing, in the Markov equilibrium the infection curve flattens relative to the case of myopic behavior (the standard SIR model). We find that the planner's optimal interventions are significantly affected by the diagnosis rate of infections, the vaccine arrival rate, and the planner's ability to enforce policies over the long term. The welfare gains from lockdown policies tend to be small, reducing the welfare loss from the pandemic relative to the equilibrium by less than 10%. When the vaccine is expected to arrive within a year or two (as could be expected at the beginning of the COVID pandemic), the optimal reduction in activity begins earlier, is more gradual, and extends well beyond the date at which activity has returned to normal in the equilibrium allocation. In contrast, when the vaccine is expected to arrive decades into the future, the optimal policy is to encourage (discourage) agents to take high actions before (after) achieving herd immunity so that herd immunity is achieved quickly but unnecessary deaths are avoided. In contrast to lockdown policies, we find that quarantine policies are effective even with imperfect testing. Kermack and McKendrick (1927) present the basic mathematical framework for studying the evolution of infectious diseases. Models that build upon this framework assume that agents can be placed into categories based on their health status, that there is a fixed probability that an infected agent passes the infection to a susceptible agent when they meet, and that there is a fixed rate at which infected agents recover or die. In the simplest formulation, there are only susceptible and infected agents. These SI models are appropriate for infectious diseases that are incurable but not deadly. 4 SIS models are ones in which infected agents can recover, but when they do, they become susceptible to reinfection. SIR models are ones in which infected agents can recover (or die) and acquire lifelong immunity. 5 Although mathematical epidemic models provide insights into the spread of infectious diseases, they often ignore individual choice or public policy. 6 There are several papers that modify the basic SIR model to allow for either government policies or individual decisions to influence the course of the epidemic. We describe the models as non-strategic if the government has the ability to mandate changes through quarantines, lockdowns, or other non-pharmaceutical interventions. We call the models strategic if individual agents independently decide levels of care that influence exposure to the disease. Sethi (1978) examines the problem of a planner who can choose to quarantine a fraction of infected agents in an SIS model. With linear payoffs and costs, he identifies a bang-bang solution in which the planner either quarantines all infected agents or none of them. More recently, Kruse and Strack (2020) incorporate social distancing in a non-strategic model of infections and study a version of the SIR model in which a social planner can, at a cost, influence the transmission rate. With linear cost, they show that socially optimal policies are bang-bang: the social planner reduces the transmission rate as much as possible or does not reduce the transmission rate at all. It is typically not optimal to reduce the transmission rate when the fraction of infected agents is small. For some of their analysis, Kruse and Strack (2020) assume that the planner can only impose social distancing for (no more than) a fixed length. With this restriction, lockdowns should start only after the number of infected agents reaches a threshold. Turning to strategic models, Geoffard and Philipson (1996) , Kremer (1996) , and Auld (2003) study the extent to which strategic choices may undermine the effects of public policy regarding the spread of an infectious disease. These papers focus on HIV (human immunodeficiency virus) infections, where the heterogeneity of the population and the ability to select who to interact with are of first-order importance. Reluga (2010) , Chen et al. (2011 ), Fenichel et al. (2011 ), Chen (2012 ), Fenichel (2013 , and Toxvaerd (2019) present strategic models of social distancing that predates the current COVID-19 epidemic. Fenichel (2013) , which is an extended analysis of Fenichel et al. (2011) , assumes that agents can select the intensity of their interaction with others and assumes that flow utility is a singlepeaked concave function of this intensity. He contrasts socially optimal choices of these contact levels with privately selected values and points out that if the social planner cannot distinguish between groups (and therefore any restriction on interactions must apply to susceptible, infected, and recovered agents alike), then social welfare may be higher in the laissez-faire equilibrium than in the constrained social planner's problem. This possibility arises because the planner's intervention constrains the participation of recovered agents (who generate positive externalities) in addition to the participation of infected agents. Chen et al. (2011) and Chen (2012) study a static game in which susceptible agents decide on their level of activities. This game may exhibit multiple equilibria, which are typically inefficient. Whether the susceptible agents are more or less active than the social optimum depends on the nature of the matching technology. More recently, Toxvaerd (2019) points out that in the presence of strategic agents, public policy interventions that lower the infectiousness of a disease may lower social welfare because agents respond to the change by increasing their own exposure. Following the onset of the COVID-19 epidemic, a large number of papers have been written by economists. Since this literature is too large to review, we only discuss the subset of papers that focus on the theory and applications. Abel and Panageas (2020) study an optimal control problem as well as the laissez-faire equilibrium in an SIR model with population growth and show that a steady state exists and the disease becomes endemic no matter how large the cost from excess death is. Budish (2020) conceptualizes R ≤ 1 (effective reproduction number less than 1) as a constraint, discusses the optimal policy in a static setting with heterogeneous economic activities, and illustrates that cheap policies such as mask wearing go a long way in containing the virus spread with minimal welfare costs. Toxvaerd (2020) studies an SIR model with endogenous social distancing, which is similar to ours. Assuming linear utility and costs, he shows that susceptible agents either engage in no social distancing at all or social distance to maintain a target peak prevalence, which endogenously flattens the infection curve. Relative to this small literature of theoretical strategic epidemic models, our main contribution is that we explicitly model imperfect testing and enforcement and systematically study the welfare implications and optimal policies. We introduce rational and potentially undiagnosed Bayesian agents into the basic Kermack and McKendrick (1927) Susceptible-Infected-Recovered (SIR) epidemic model. We consider an infectious disease that can be transmitted between agents in a society, which consists of a large ( but finite) number of agents indexed by n ∈ N = {1, . . . , N }. Time is discrete, runs forever, and is denoted by t = 0, ∆, 2∆, . . . , where ∆ > 0 is the length of time in one period. Agent types and information At each point in time, agents are categorized into several types based on their health status and information. An agent who does not have immunity against the infectious disease is called susceptible and denoted by S. An infected agent who is known (unknown) to be infected is denoted by I k (I u ). An alive agent with immunity who is known (unknown) to be immune is called removed (or recovered ) and denoted by R k (R u ). A dead agent is denoted by D. The set of all types (health status) is denoted by For instance, an agent could be I k if he tests positive for antigen or he shows specific symptoms (is symptomatic), and I u if no tests are available and he shows no specific symptoms (is asymptomatic). Similarly, an agent could be R k if he tests positive for antibody, he recovered from past symptomatic infection and immunity is lifelong, or he is vaccinated. Thus the set of information types is where U denotes the unknown type. When no confusion arises, we refer to an I k agent "infected" and an R k agent "removed/recovered", without the qualifier "known". Importantly, we suppose that when an agent gets infected, with probability σ ∈ (0, 1] the agent receives a signal that reveals the true health status (known infected, I k ). Otherwise, the agent becomes unknown infected (I u ). Although we refer to the signal as a "test", the signal could be literally a laboratory test as well as other information such as the presence of specific symptoms, knowledge of close contacts with confirmed cases, etc. We refer to the probability σ as the diagnosis rate. In Appendix B we show that a model with a single signal is observationally equivalent to a model with multiple signals with potentially heterogeneous fatality rates. Let N h ⊂ N be the set of agents with health status h ∈ H. With a slight abuse of notation, we also use the same symbol h to denote the fraction of type h agents in the population, so h = |N h | /N . The space of the aggregate state (type distribution) is denoted by (2.1) We suppose that the aggregate state is observable. (I k , R k , D are observable, and I u and R u can be inferred from a small scale random antigen and antibody testing.) The economy starts at t = 0 with some initial condition z 0 ∈ Z. Actions and preferences At each point in time, each alive agent takes action a ∈ A = [ā, 1], where the minimum action ā satisfies 0 ≤ ā < 1. We interpret a as the economic activity level: loosely, a = 1 corresponds to following a normal life and a = ā to completely being locked down (minimum activity level for subsistence). The utility function of an unknown (U ) and known recovered (R k ) agent is denoted by u : A → R. The utility function of a known infected agent (I k ) is denoted by u I : A → R. A dead agent receives the flow utility u D ∈ R. 7 Agents discount future payoffs with discount factor e −r∆ , where r > 0 is the discount rate. Disease transmission Agents meet each other randomly over time and transmit the infectious disease. If agents n, n take actions a n , a n respectively, then agent n bumps into agent n during a period with probability λ∆a n a n /N , where λ ∈ (0, 1/∆) is a parameter (meeting rate) that governs the level of social interaction with full activity (a n = a n = 1). We take λ as given, which depends on how the society is organized (e.g., population density, whether workers commute by cars or public transportation, whether consumers shop online or at physical stores, whether classes are taught remotely or in-person, etc.). If agent n is susceptible (n ∈ N S ) and agent n is infected (n ∈ N I k ∪ N Iu ), the infectious disease is transmitted from n to n with probability τ ∈ (0, 1] conditional on n bumping into n at time t. 8 We also take τ as given, which depends on how contagious the disease is as well as how the society is organized (e.g., how often people wash their hands, whether they wear masks, whether they greet others by bowing, shaking hands, hugging, or kissing, etc.). We assume that ∆ is small enough such that in any period an agent bumps into at most one other agent. Therefore if a h := ( n ∈N h a n )/ |N h | denotes the average action of type h, a particular susceptible agent n who takes action a gets infected with probability 7 More precisely, u D is the flow utility of being dead anticipated by alive agents. 8 Thus the act of "bumping into" is asymmetric between the members in a meeting. Figuratively , here is what happens if agent n bumps into n (in the case of an upper respiratory infection such as COVID-19): (i) agent n meets agent n , (ii) agent n sneezes into agent n's face (and transmits the disease to n with probability τ if n is infected), and (iii) they part with each other. where β := τ λ is the baseline transmission rate and we have used the notation h = |N h | /N for h = I k , I u . The timing convention is that if an infection occurs at time t, the (previously susceptible, now infected) agent changes status to I k or I u at time t + ∆. Since only susceptible agents are prone to infection, the expected fraction of the population that gets newly infected between time t and t + ∆ is which is called incidence in epidemiology. The fraction of infected agents is called prevalence. Recovery and death An infected agent is removed (becomes no longer infected by either recovering or dying) with probability γ∆ each period, where γ ∈ (0, 1/∆] is the removal rate. Conditional on being removed, a known infected (I k ) agent dies with probability δ ∈ (0, 1] and an unknown (I u ) agent always recover. The rationale for this assumption is that infected agents with more severe symptoms are more likely to get tested as well as to die. Letting δ 0 be the fatality rate among all (known and unknown) infected agents, since the diagnosis rate is σ, we have δ = δ 0 σ . (2.5) Finally, recovered or dead agents remain in their corresponding states forever, implying that recovered agents acquire lifelong immunity (see Footnote 5). Again the timing convention is that if an infected agent is removed at time t, the agent changes status to R k , R u , or D at time t + ∆. In epidemiology, there are several notions of fatality rate, and it is important to understand the distinction. The fatality rate among all (known and unknown) infected cases (which corresponds to δ 0 ) is called the infection fatality rate (IFR). The fatality rate among known (confirmed) infected cases (which corresponds to δ if the signal is a laboratory test) is called the case fatality rate (CFR). The fatality rate among the entire population is called mortality. Clearly, by definition we have Mortality ≤ IFR ≤ CFR. Vaccine arrival We assume that a vaccine arrives at a Poisson rate ν ≥ 0, independent of everything else. Thus in our discrete-time setting, the probability that a vaccine arrives between time t and t + ∆ is 1 − e −ν∆ . We assume that the vaccine is perfectly effective, perfectly safe, and has no cost. Thus once a vaccine arrives, all non-infected agents will be vaccinated and become immune (R k ). The vaccine is not a cure and hence has no effect on infected agents. Throughout the rest of the paper, we maintain the following assumptions. Assumption 1 (Utility function). The utility functions satisfy the following conditions: (i) u : A = [ā , 1] → R is twice continuously differentiable and satisfies u(1) = 0, u > 0, and u < 0, (ii) u I : A = [ā , 1] → R is continuous, strictly concave, and achieves a unique maximum at a I ∈ A, and (iii) u D < u I (a I ) ≤ u(1). The assumption u(1) = 0 simplifies the algebra and is without loss of generality because we can shift the utility functions by a constant without affecting behavior. The assumptions u D < u I (a I ) ≤ u(1) simply imply that being asymptomatic is preferable to being symptomatic, which is in turn preferable to being dead. The condition that u I is single-peaked at a I ∈ A implies that a potentially intermediate value of activity level (rest) is myopically optimal for symptomatic agents. This assumption can also be interpreted as altruism, sense of duty, or an enforcement of a quarantine policy. Assumption 2 (Perfect competition). Agents view the evolution of the aggregate state z as exogenous and ignore the impact of their behavior on the aggregate state. Assumption 2 is necessary for analytical tractability and is reasonable when the number of agents N is large. Assumption 3 (Consistency). On equilibrium paths, agents update their beliefs using the Bayes rule. Off equilibrium paths, unknown (U ) agents believe they are susceptible with probability otherwise. (2.6) The assumption that agents apply the Bayes rule may not be realistic because a pandemic such as COVID-19 is rare and agents may have difficulty forming beliefs when faced with an unprecedented situation. However, we focus on the Bayes rule because it provides a benchmark analysis. The assumption that we specify the off-equilibrium beliefs as in (2.6) suggests that our equilibrium concept would be perfect Bayesian equilibrium (PBE). We discuss this point in more detail in Section 3.2. This section defines and establishes the existence of equilibrium and characterizes individual behavior. We formalize the individual optimization problem recursively. Let z ∈ Z be the aggregate state and V h be the value function of type h ∈ {U, I k , R k , D} agents. Dead agents Because dead agents remain dead and their flow utility is u D , we Known recovered agents Because known recovered agents have lifelong immunity, their value function is constant and the Bellman equation is The optimal action is clearly a R k = 1 (full activity) and the value function is V R k = u(1) = 0 by Assumption 1. Known infected agents By assumption, known infected agents are removed with probability γ∆, and conditional on removal, die with probability δ = δ 0 /σ. Since the health status transitions are independent of the aggregate state and action, the value function is constant and the Bellman equation is (3.1) By Assumption 1 the function u I is single-peaked at a I , and so the optimal action is a I k = a I . Since V R k = u(1) = 0 and V D = u D , (3.1) simplifies to where u I := u I (a I ). Note by Assumption 1 that we have Unknown agents Because unknown agents need to infer their health status, the analysis is more complicated. Suppose unknown agents adhere to a policy function a U : Z → A and that they always have the belief (2.6). (We will verify that these assumptions are satisfied in equilibrium.) The policy a U (z) together with the mechanisms of disease transmission, symptom development, recovery, and death generate transition probabilities {q(z, z )} (z,z )∈Z 2 for the aggregate state conditional on no vaccine arrival. (Note that Z in (2.1) is a finite set.) By Assumption 2, agents view this law of motion as exogenous. Let V U (z) be the value function of an unknown agent who chooses the action optimally in this environment. By (2.2) and the analysis of known infected (I k ) agents, an agent taking full action (a = 1) gets infected with probability conditional on being susceptible. Noting that infection is known with probability σ and the vaccine arrives with probability 1 − e −ν∆ , the Bellman equation is where E z denotes the expectation with respect to {q(z, z )}, µ = µ(z) is given by (2.6), p = p(z) is given by (3.3), and V I k is given by (3.2). Noting that (3.5) The following proposition establishes the existence and uniqueness of V U and provides some bounds on value functions. Proposition 3.1 (Value functions). Fix a policy function a U : Z → A of unknown agents. Then there exists a unique value function V U : Z → R satisfying the Bellman equation (3.5). Furthermore, the value functions satisfy the following inequalities: The proof of Proposition 3.1 as well as other longer proofs are deferred to Appendix A. The inequality (3.6) is quite intuitive. In terms of flow utility, having no symptoms is better than having symptoms, which is better than death. Because the states R k , D are absorbing and a known infected agent may recover or die, the inequalities V D < V I k < V R k = 0 are immediate. The inequality V U ≤ V R k is also immediate because an unknown agent could get infected and generally chooses a lower action. The inequality V I k < V U follows from the fact that an unknown agent can always choose the myopic optimal action (a = 1, which generates flow utility 0 = u(1) ≥ u I (a I ) = u I ) but gets infected only in the future. The following proposition characterizes the best response of an unknown agent. To state the result, we define the inverse marginal utility function φ : (3.7) Proposition 3.2 (Best response of U agents). Fix a policy function a U : Z → A and a continuation value V U : Z → R of unknown agents. Then the best response of an unknown agent is where µ(z) and p(z) are given by (2.6) and (3.3), respectively. Because u is strictly decreasing, φ in (3.7) is decreasing. As a result, we immediately obtain the following corollaries. Corollary 3.3 provides comparative statics results in a partial equilibrium setting (where the policy function a U (z) and the continuation value V U (z) are exogenously given). The best response of an unknown agent a * in (3.8) increases if (i) ν decreases (vaccine arrival less likely), (ii) µ(z) or p(z) decrease (infection less likely), (iii) δ 0 decreases (death less likely), and (iv) u I or u D increase (infection and death less scary). The result that a * decreases in µ(z), p(z), δ 0 and increases in u I , u D is intuitive: increasing µ(z), p(z), δ 0 or decreasing u I , u D makes the risk of infection or death higher, which makes agents take more precaution. The fact that the possibility of vaccine arrival (higher ν) makes agents take more precautions is also intuitive: when agents expect a vaccine to arrive soon, being locked down is less painful because it is expected to be short. As we shall see later in our numerical analysis, the vaccine arrival rate ν significantly affects the socially efficient action. The next corollary shows that when prevalence is sufficiently low, agents take no precautions. Corollary 3.4 (Full activity with sufficiently low prevalence). Let I := I k + I u be the prevalence defined in (2.4). There existsĪ > 0 such that for all policy function a U (z), we have a * = 1 whenever I <Ī. In particular, we can takē where V I k < 0 is given by (3.2). LettingĪ as in (3.9) and using (3.6), u I ≤ 0, and (2.5), we obtain . (3.10) Since u D < 0, the right-hand side of (3.10) is increasing in ν, r, u (1), u D and decreasing in β, γ, δ 0 . Thus impatience, higher marginal utility of full action, less disutility from death, lower transmission rate, longer duration of infections, and lower mortality all make agents more likely to take full action. The intuition is identical to Corollary 3.3. Our equilibrium concept is the (pure strategy) perfect Bayesian Markov competitive equilibrium defined below. Here "perfect Bayesian" means that agents update beliefs on equilibrium paths using the Bayes rule as in Assumption 3; "Markov" means that the optimal actions agents choose are functions of state variables; and "competitive" means that agents view the evolution of aggregate state variables as exogenous as in Assumption 2. Definition 3.5 (Markov equilibrium). A (pure strategy) perfect Bayesian Markov competitive equilibrium, or Markov equilibrium for short, consists of unknown agents' belief µ(z) of being susceptible, transition probabilities {q(z, z )} z,z ∈Z for the aggregate state, value functions {V h (z)} h=U,I k ,R k ,D , and policy functions The belief µ(z) satisfies the Bayes rule on equilibrium paths and is given by (2.6) off equilibrium paths; the transition probabilities {q(z, z )} are consistent with individual actions and the mechanisms of disease transmission, symptom development, recovery, and death, (ii) (Sequential rationality) (U ) V U (z) satisfies the Bellman equation (3.5) and a = a U (z) achieves the maximum, where E z denotes the conditional expectation using {q(z, z )} and p = p(z) is as in (3.3), Note that Definition 3.5 only describes the society before vaccine arrival. Once the vaccine arrives, because no new infection occur by assumption, it is optimal for all agents to take their myopic optimal action (a = 1 for h = U, R k and a = a I for h = I k ) and the problem becomes trivial. The astute reader may wonder why we adopt the notion of perfect Bayesian equilibrium and specify that beliefs are given by (2.6) even off the equilibrium path. The belief µ(z) equals the posterior belief if unknown agents have a common prior, take identical actions, and learn that the aggregate state is z, but an unknown Bayesian agent who is contemplating deviating from the equilibrium action a U (z) generally has a different posterior. The primary reason for our choice of solution concept is that of tractability, as we wish to study the role of forward-looking agents that are uncertain about their health status without obscuring the analysis with technicalities. 9 The notion of perfect Bayesian equilibria allows agents to be forward-looking and rational, if somewhat "forgetful". In Appendix C we extend the model to the case with perfect recall and show that our results are robust. The following theorem establishes the existence of equilibrium. Theorem 3.6 (Existence of equilibrium). Suppose Assumptions 1-3 hold. Then there exists a pure strategy perfect Bayesian Markov competitive equilibrium, where the belief µ(z) always satisfies (2.6). The idea of the proof of Theorem 3.6 is to start with a guess of equilibrium policy a U (z), update it as the (necessarily unique) maximizer in (3.5), show that this updating rule is continuous, and apply the Brouwer fixed-point theorem to establish a fixed-point of this operation. In this section we study the source of externalities in the model and the efficiency properties of the equilibrium. Let a U (z) and a I k (z) be any policy functions for unknown agents and known infected agents exogenously chosen by the social planner. Since the behavior of known recovered agents does not affect the aggregate state dynamics, without loss of generality we set a R k (z) = 1, which is individually optimal. Suppose the planner wishes to choose (a U (z), a I k (z)) to maximize the social welfare. In general, individual agents have a strong incentive to deviate from such recommendations and choose the individually optimal actions characterized by Proposition 3.2. Thus let λ ≥ 0 be the hazard rate of failing to enforce the recommended policy (a U (z), a I k (z)), and suppose that once the policy implementation fails, the society reverts back to the Markov equilibrium characterized in Theorem 3.6. Letting V λ h (z; a U , a I ) be the value function of type h agents in this environment, since the probability of reverting to equilibrium is 1 − e −λ∆ , we have where V U (z) and V I k are the equilibrium value functions in Theorem 3.6, µ = µ(z) and p = p(z) are given by (2.6) and (3.3), and we write a h = a h (z) and V λ h (z) = V λ h (z; a U , a I k ). By the standard contraction mapping argument, V λ h is well-defined. The utilitarian social welfare associated with the policies (a U (z), a I k (z)) is then where we have used V R k = 0 and V D = u D . Noting that u D is constant and D depends only on z (and not on the policies (a U (z), a I k (z))), conditional on the aggregate state z, we can rank social welfare associated with particular policy functions by using Given the hazard rate λ, the optimal policies (a U (z), a I k (z)) maximize (3.13). The Markov equilibrium in Definition 3.5 is generally inefficient because the equilibrium policies (a * U (z), a * I k (z)) do not maximize the welfare criterion (3.13) due to externalities. We identify two types of externalities, static and dynamic. The static externality refers to the fact that when (known or unknown) infected agents take higher actions, they infect susceptible agents with some probability and affect the current value function of unknown agents, even if the future value functions and the aggregate state transitions remain the same. The dynamic externality refers to the fact that the collective behavior of agents affect the aggregate state transitions and hence future value functions. Because studying the dynamic effect analytically is challenging, we focus on the static effect in this section, and study the dynamic externality in our numerical exercises. To study the static externality, imagine that the social planner can intervene to alter agents' current actions (a U , a I k ), fixing the transition probabilities {q(z, z )} of the aggregate state as well as next period's value functions {V h (z )} h=U,I k ,R k ,D . Using (3.1), (3.5), (3.13), and noting that V R k = 0, we can define the objective function of the planner that seeks to eliminate the static externality by Restricting the action of known infected agents can be interpreted as quarantine. Restricting the action of unknown agents can be interpreted as lockdown. Hence we introduce the following definition. Definition 3.7 (Static efficient actions). Fix transition probabilities {q(z, z )} and value functions V U , V I k . Given a policy function a U (z) of unknown agents, we say that the quarantine policy a † I k (z) is static efficient if a I k = a † I k (z) achieves the maximum of (3.14). Similarly, given a policy function a I k (z) of known infected agents, we say that the lockdown policy a † U (z) is static efficient if a U = a † U (z) achieves the maximum of (3.14). The reason we use the qualifier "static" can be understood as follows. Imagine that a benevolent government implements some lockdown policy when faced with an epidemic. The optimal policy then obviously depends on the scale and duration of the intervention. If the intervention is implemented in a large scale, the dynamic effects cannot be ignored. Furthermore, because individuals have an incentive to deviate from lockdown policies, the optimal policies when the government has unlimited enforcement power may be different from a short term policy. The policies in Definition 3.7 are efficient in the sense that the government seeks to implement the optimal policy in a small scale when anticipating the continuation values V U , V I k (which are the equilibrium value functions if the government can commit only for one period). The following proposition characterizes the static efficient quarantine policy. Proposition 3.8 (Static efficient quarantine policy). Suppose the utility function of known infected agents u I is continuously differentiable, strictly concave, with inverse marginal utility function φ I analogous to (3.7). Then the static efficient quarantine policy is given by Proof. Let W be as in (3.14). Using p = β∆(a I k I k + a U I u ), we obtain The rest of the proof is the same as Proposition 3.2. Proposition 3.8 has several implications. First, (3.15) does not depend on the fraction of known infected agents I k except through the continuation values. This is because the welfare gain from reducing new infections and the welfare loss from restricting infected agents' actions are both proportional to I k , as we can see from the proof of Proposition 3.8. Second, under normal circumstances the planner seeks to quarantine known infected agents intensively. To see this, using (3.3), note that the individually optimal action of unknown agents (3.8) can be rewritten as ( 3.16) Assuming that infected and unknown agents have identical preferences (so u I = u), the only difference between (3.15) and (3.16) is that the argument of the former is proportional to a U U , whereas the argument of the latter is proportional to a I k I k + a U I u . In normal situations, the fraction of actively infected agents in the society is rather small, so I k , I u U . Then the argument of (3.15) is a much bigger number than that of (3.16), and because φ is decreasing, we would have a † I k a * U . We next study the optimal intervention to unknown agents. Suppose all known infected agents take action a I ∈ A, which can be interpreted as an exogenous quarantine policy by the discussion after Assumption 1. Let a * U (z) be the equilibrium policy of unknown agents and V U , V I k the corresponding value functions established in Theorem 3.6. The following theorem provides a bound for the static efficient lockdown policy, which is our main theoretical result. Theorem 3.9 (Static efficient lockdown policy). Let a * U (z) be the equilibrium policy of unknown agents and m = min a∈A |u (a)| > 0. There exists a unique static efficient lockdown policy a † U (z), which satisfies In particular, if σ = 1 (so I u = 0), then a † U (z) = a * U (z). Theorem 3.9 has several implications. First, the inequality a * U ≥ a † U implies that the equilibrium action of unknown agents is too high relative to the static efficient lockdown policy. This is intuitive, because a subset of unknown agents are infected and impose a negative externality on others by going out and infecting other susceptible agents. Second, and more interestingly, is that the difference between the static efficient and equilibrium actions is bounded above by a number proportional to the fraction of unknown infected agents. This quantity is typically small, and approaches zero as σ ↑ 1. A government who can only enforce short-term lockdown policies will therefore always (weakly) wish to curb activity, but their incentive to do so vanishes with the fraction of unconfirmed cases. This is particularly noteworthy because we shall later provide examples in which a government with unlimited enforcement power would wish to do the reverse, and recommend higher activity than that which occurs in equilibrium. It is in this sense that policy prescriptions can depend crucially on the enforcement capabilities of the government. This section studies the equilibrium dynamics. Given arbitrary policy functions {a h (z)} h=U,I k ,R k , we define the transition probabilities {q(z, z )} induced by these policies and the mechanisms of disease transmission, symptom development, recovery, and death. To simplify the notation, let z t = z, S t = S, S t+∆ = S +∆ , etc. Using (2.2), it is straightforward to show that (3.18c) (We omit the dynamics for R k , R u , D agents because they do not depend on policy functions.) We can simplify these equations further if we consider the limit N → ∞ and apply the strong law of large numbers. In the large population limit, letting I = I k + I u be the fraction of infected agents, we have I k = σI and I u = (1 − σ)I. Therefore adding (3.18b) and (3.18c), we obtain the system of deterministic difference equations is known as the effective reproduction number in epidemiology. At the early stage of the epidemic, by definition we have S ≈ 1 and I ≈ 0. Therefore by Corollary 3.4 (and assuming u I (a) = u(a)), the equilibrium policies are a U (z) = a I k (z) = 1, and (3.20) reduces to which is known as the basic reproduction number. When unknown agents are myopic (so a U (z) = a I k (z) = 1), (3.19) reduces to Since by (3.19a) the fraction of susceptible agents S always decreases over time, the fraction of infected agents I decreases over time once holds, where R 0 is the basic reproduction number in (3.21). We say that the society has achieved herd immunity if (3.23) holds. In this section we use a numerical example calibrated to the COVID-19 epidemic to study how the agents' optimizing behavior, diagnosis rate, and lockdown policies affect the epidemic dynamics. One period corresponds to a day and we assume 5% annual discounting, so r = 0.05/365.25 and ∆ = 1. Because COVID-19 vaccines became available after about one year since the start of the epidemic, we set the vaccine arrival rate to ν = 1/365.25. Unless otherwise stated, we suppose that the social planner has unlimited enforcement power (λ = 0 in (3.11)-(3.13)). We set the epidemic parameters from the medical literature. The daily transmission rate is β = 1/5.4 based on the median serial interval (the number of days it takes for an infected person to transmit the disease to another person) in the meta-analysis of Rai et al. (2021) . The daily recovery rate is γ = 1/13.5 based on the median infectious period estimated in You et al. (2020) . These numbers imply that the basic reproduction number in (3.21) is R 0 = β/γ = 13.5/5.4 = 2.5, which equals the current best estimate used by Centers for Disease Control and Prevention (CDC). 10 The infection fatality rate (IFR) is δ 0 = 0.0027, which is the median value in the meta-analysis of Ioannidis (2021) on studies that use seroprevalence data. 11 The choice of the diagnosis rate σ is more controversial. One possibility is to estimate the case fatality rate (CFR) δ and set σ = δ 0 /δ based on (2.5). Using the data on cumulative number of reported cases and deaths provided by Johns Hopkins University Center for Systems Science and Engineering, 12 we find that the median CFR across all (more than 200) regions is δ = 0.0134. This would imply σ = δ 0 /δ = 0.2 (20%). However, this calculation ignores other information such as the presence of symptoms. Another possibility is to set σ as the fraction of symptomatic agents. Based on the case study of the cruise ship Diamond Princess, which experienced a COVID-19 outbreak in February 2020 and whose passengers were all tested, Mizumoto et al. (2020) document that about 50% of confirmed cases were asymptomatic. Noting that the symptoms of COVID-19 are often similar to other upper respiratory infections and not specific enough to confirm the diagnosis, as a compromise, we choose an intermediate value σ = 0.4 for the diagnosis rate in our baseline analysis. We set the minimum action to ā = 0.01 (which is somewhat arbitrary but never binds in simulations). The utility function of a symptom-free agent exhibits constant relative risk aversion (CRRA) α > 0, so which satisfies Assumption 1. We suppose u I = u, so the optimal action for known infected agents is a I = 1. Although this assumption is unrealistic because infected agents may be incapacitated, altruistic, or quarantined, it provides the 10 https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html 11 Unlike the case fatality rate (CFR), which is defined by the reported number of deaths divided by the reported number of cases, the estimation of IFR is complicated by the fact that cases and deaths may be underreported. The true number of cases can be estimated from a random antibody testing, which is called a seroprevalence survey. If we assume underreporting in deaths is not severe, then IFR can be estimated by dividing the reported number of deaths by the estimated number of cases. 12 https://github.com/CSSEGISandData/COVID-19 most conservative (worst case) analysis. For numerical illustrations, we set α = 1 (log utility). We calibrate the flow utility from death to u D = −12.22 based on the case study from Sweden, which did not introduce mandatory lockdowns (see Appendix E for details). Finally, we set the initial condition to I 0 = 10 −6 , S 0 = 1 − I 0 , R 0 = 0, and D 0 = 0. We solve for the perfect Bayesian Markov competitive equilibrium using the algorithm discussed in Appendix D. As a point of comparison, we also solve for the myopic allocation in which all agents choose a = 1, as well as the static efficient and efficient actions, which correspond to setting λ = ∞ and λ = 0 in (3.11)-(3.13), respectively. Figure 1 shows the epidemic dynamics studied in Section 3.4 for the myopic equilibrium, which is the standard Kermack and McKendrick (1927) SIR model. Here and elsewhere, the vertical dashed line indicates the first date of achieving herd immunity as defined by (3.23). When agents are myopic and choose a = 1, as is well known, the epidemic has two phases: in the first phase, the fraction of infected agents initially grows exponentially (at daily rate approximately β − γ = γ(R 0 − 1) by setting S = 1 in (3.22b)) until the society achieves herd immunity (peak prevalence is 23.9%); in the second phase, the fraction of infected decays exponentially at daily rate γ. The epidemic dynamics significantly changes once we introduce optimizing behavior. Figure 2 shows the epidemic dynamics (left panels) and contour plots of recommended actions over the state space (right panels). The top and bottom panels are for the Markov equilibrium and efficient allocations, respectively. We now make a few observations regarding efficient and equilibrium allocations. First, when agents are forward-looking, the epidemic has three phases: during the first phase, the disease spreads freely and we see exponential growth (peak prevalence is 6.62%); during the second phase, unknown agents voluntarily practice social distancing (set a * U < 1) and the disease spread is endogenously mitigated; during the third phase, the society achieves herd immunity and the fraction of infected decays exponentially. Second, the epidemic dynamics and recommended action for the solution to the planner's problem (bottom panels) are qualitatively different from the Markov equilibrium. The planner who can enforce and commit to a lockdown policy finds it optimal to substantially reduce the action of unknown agents (bottom right). In the resulting dynamics (bottom left), the fraction of infected agents both grows and declines more slowly, and it takes almost 50% longer to reach herd immunity. Figure 3 plots the time paths of recommended actions (equilibrium and efficient) corresponding to the dynamics in the left panels of Figure 2 . As expected, we see that the efficient action is al-most everywhere below that of the equilibrium action, implying that the planner wishes to curb activity more quickly and for longer than do the individual agents. Further, compared with the equilibrium allocation, the reduction in activity in the efficient allocation begins earlier, is more gradual, and extends well beyond the date at which activity has returned to normal in the equilibrium allocation. However, as we noted in the introduction, the efficient and equilibrium time paths plotted in Figure 3 do not necessarily provide adequate guidance to a government who was either slow to react to the initial infection, or who has limited ability to carry out their desired intervention. We emphasize that Figure 3 ought to be interpreted as plotting the activity levels across two fictitious countries, one that pursued no non-pharmaceutical interventions at all, and one that followed the optimal path from the outset of the virus. We therefore believe it is instructive to complement the foregoing and examine how the recommendations of governments with varying degrees of enforcement ability vary along the same path for the state variables. To analyze this point and to illustrate the theoretical analysis in Section 3.3, Figure 4 plots the equilibrium level of activity together with the static efficient action (λ = ∞) and the action the planner would recommend (λ = 0) along the equilibrium path. As Theorem 3.9 suggests, the static efficient path is everywhere below that of the equilibrium path, and so a government who can only enforce lockdowns for a short period of time would unambiguously wish to do so. However, it does not follow that a government with the ability to perfectly enforce activity levels in perpetuity would wish to reduce activity. Indeed, in this example the difference between the efficient and equilibrium activity levels cannot be unambiguously signed, for at some points in the middle of the pandemic the activity choice of the planner exceeds that of the agents in equilibrium. We interpret this last observation to illustrate that relative to the efficient allocation, the competitive equilibrium exhibits inefficiently volatile consumption, with abrupt changes over the time that the planner may wish to avoid with more gradual increases and decreases in activity. We have so far assumed that unknown infected agents accounts for 1 − σ = 60% of all the infected agents. To illustrate the role of imperfect testing and reporting (and to therefore relate our results to the existing literature) we now investigate disease dynamics in the case in which σ = 1. Figure 5 shows the epidemic dynamics (left panels), contour plots of recommended actions over the state space (right panels), and the recommended actions over time (bottom panel). The top panels are for the Markov equilibrium and the bottom panels are for the efficient actions. When σ = 1, all infected agents are known, and by assumption they all take action a I = 1. Because the average action of infected agents is higher, the negative externality from infected agents is larger and unknown (in this case, susceptible) agents reduce action more relative to the Markov equilibrium with σ < 1. The resulting efficient lockdown policy appears qualitatively different from the case with imperfect testing, with the planner essentially delaying any reduction in activity until the date at which the agents begin social distancing in equilibrium. Further, in this case the efficient and equilibrium paths appear qualitatively similar, except for the fact that the latter is more pronounced in its departure from "normality". To further our understanding of these comparative statics, we compute the welfare cost and death toll across different specifications for the diagnosis rate σ. For welfare, we use the utilitarian social welfare W in (3.12) and apply the inverse utility function u −1 to convert into units of activity. Since the welfare without epidemic equals the highest action 1, we can compute the welfare cost of the epidemic given the current state z ∈ Z as If we identify activity with current output, we can interpret C(z) as the fraction of permanent consumption the society is willing to give up to avoid the epidemic. For the death toll, we use the cumulative death (per 100,000 population) after 600 days conditional on no vaccine arrival. Table 1 shows the results, where the columns labeled with "Myopic", "PBE", "SPP" correspond to the myopic equilibrium (standard SIR model), perfect Bayesian Markov competitive equilibrium, and the social planner's problem (efficient action), respectively. The epidemic has a large welfare cost of about 1.8% reduction in permanent consumption. Interestingly, for all reporting levels examined, the difference in welfare is larger between the equilibrium and myopic allocation than between the equilibrium and efficient allocation, implying that the welfare gains from the optimal lockdown policy is modest. The analysis so far assumes that known infected (I k ) agents choose the highest action a I = 1. Although this is unrealistic because infected agents may selfisolate due to incapacitation or altruism, it provides the most conservative analysis (worst-case scenario). As a complementary analysis, we solve the model assuming that known infected agents choose a I = 0.4 yet enjoy the highest possible utility (u I = 0). The choice of a I = 0.4 is motivated by the empirical study of He et al. (2020) , who document that 44% of secondary cases were infected during the presymptomatic stage (prior to diagnosis) of the primary cases. This assumption corresponds to the immediate isolation of the infected cases upon diagnosis, which provides the best-case analysis. Table 2 shows the welfare cost and death toll with maximal quarantine. In each case, the welfare gains from quarantine are substantial even with relatively low diagnosis rate σ. In contrast, the additional welfare gains from the optimal lockdown policy are modest. As is intuitive, the difference between the high and low quarantine allocations is particularly stark when the diagnosis rate is high. For σ = 1, individuals take no precautions in either the efficient or equilibrium allocations, since the level of infections necessary to reach herd immunity is very small. Death toll (per 100,000) σ Myopic PBE SPP Myopic PBE SPP Our model has so far assumed that the vaccine is expected to arrive within 1 year. Consequently, it is likely that the vaccine will have arrived before the attainment of herd immunity. How much does this assumption affect the optimal policies? Should we expect qualitatively similar features to emerge when a vaccine is unlikely to be forthcoming? As an extreme example, we solve for the equilibrium and efficient actions when the vaccine arrives very far into the future (T = 100 years). Figure 6 shows the epidemic dynamics (left panels), contour plots of recommended actions over the state space (right panels), and the recommended action over time (bottom panel) . Compared to the case with vaccines ( Figure 2 , where ν = 1/365.25), the equilibrium dynamics and action (top panels) without vaccines are essentially identical. This is because the possibility of vaccine arrival changes the argument of the individually optimal action (3.8) by a factor of e −ν∆ , which is close to 1 even if ν is large such as 1/365.25. However, the possibility of vaccine arrival significantly affects the optimal lockdown policy. Comparing the efficient actions in the bottom right panels of Figures 2 and 6 , we can see that when a vaccine is not expected, the planner encourages (discourages) agents to take high action before (after) achieving herd immunity, similar to the case with perfect reporting ( Figure 5 ). Although this result may be counterintuitive, the intuition is actually straightforward. In the absence of a vaccine, the only way to end the epidemic is to achieve herd immunity. Therefore the planner wishes to initially exacerbate the epidemic to reach herd immunity quickly (thus reducing the cost of low action) and then mitigate the epidemic after herd immunity to reduce unnecessary deaths. We can clearly see this from the bottom right panel of Figure 6 , where we plot the recommended action along the equilibrium path (not over time), meaning that we are plotting the recommended actions at the same points in the state space. Table 3 shows the welfare cost and death toll when the vaccine is expected to take T = 100 years to arrive. As expected, the welfare cost of the epidemic is higher since it takes longer for the vaccine to arrive. More interestingly, the bottom panels of Figures 5 and 6 show that the optimal policy is to encourage (discourage) agents to take high action before (after) achieving herd immunity. Our intuition for this is that when it takes a long time until vaccine arrival, the planner wishes to reduce abrupt fluctuations in activity and also ensure herd immunity is not excessively "overshot" in order to minimize deaths. Figure 7 reinforces this intuition, and shows that the share of susceptible agents is reduced more rapidly and plateaus more smoothly in the efficient allocation, so as to effect a "smooth landing" out of the pandemic. In this paper we have theoretically studied optimal epidemic control in an equilibrium model with imperfect testing and enforcement. We proved the existence of a perfect Bayesian Markov competitive equilibrium, and showed that a government that can only enforce lockdowns for a short period of time will wish to (weakly) reduce activity, but that their incentive to do vanishes if testing is perfect. We then showed through numerical examples that optimal policy is highly dependent upon the government's ability to enforce social distancing over the long-term, and that if a vaccine is not expected to ever arrive, the optimal activity recommended by the government may exceed the equilibrium activity levels until the population of susceptible agents is sufficiently low. In contrast, when a vaccine is expected to arrive within a few years (as could reasonably be expected at the outset of the COVID-19 epidemic), the optimal policy typically involves an immediate and prolonged enforcement of social activity. To prove Proposition 3.1, we note the following simple lemma. Lemma A.1. Let X be a complete metric space and T : X → X be a contraction with fixed point x * ∈ X. If ∅ = X 1 ⊂ X is closed and T X 1 ⊂ X 1 , then x * ∈ X 1 . Proof. Take any x 0 ∈ X 1 and define x n = T n x 0 for n ∈ N. Since by assumption T X 1 ⊂ X 1 , by induction we have x n ∈ X 1 for all n. By the contraction mapping theorem, we have x n → x * . Since X 1 is closed, we have x * ∈ X 1 . Proof of Proposition 3.1. Let X = R Z . For V ∈ X, define T V (z) by the righthand side of (3.5), where V U = V . Since A = [ā, 1] is nonempty and compact, u : A → R is continuous, and Z is a finite set, the maximum is achieved and T : X → X is well-defined. Let us show that T is a contraction by verifying Blackwell (1965) 's sufficient conditions. If V 1 , V 2 ∈ X and V 1 ≤ V 2 pointwise, since 0 ≤ σµpa ≤ 1, it follows from the definition of T that so T is monotone. If V ∈ X and c ≥ 0 is any constant, then so the discounting property holds with modulus e −(r+ν)∆ < 1. Therefore T is a contraction mapping and has a unique fixed point V U ∈ X. Because u : A = [ā, 1] → R is continuous (hence bounded), it is straightforward to verify the transversality condition. Therefore V U is the value function. Next, let us show the inequalities (3.6). Noting that u D < u I ≤ 0, r > 0, γ > 0, δ ∈ (0, 1], and V I k is given by (3.2), we have To obtain the lower bound for V U , Define the constants and V 1 := c 1 e ν∆ V I k . Note that which is true because σ ∈ (0, 1], β∆ ∈ [0, 1], r > 0, and ν ≥ 0. Since c 1 e ν∆ < 1 and V I k < 0, it follows that which is part of (3.6). Let us next show V U ≥ V 1 . To this end, define the (nonempty closed) subset of X by X 1 = {V ∈ X | V ≥ V 1 }. By Lemma A.1, to show V U ≥ V 1 , it suffices to show T X 1 ⊂ X 1 . If V ∈ X 1 , then by the definition of T , we have Setting a = 1 in the right-hand side of (A.2) and noting that u(1) = 0, we obtain Therefore to show T X 1 ⊂ X 1 , noting that V I k < 0, it suffices to show that By (3.3), a U (z) ≤ 1, and the definition of Z in (2.1), we have p = β∆(a I I k + a U (z)I u ) ≤ β∆. Since µ ∈ [0, 1] and c 1 ∈ (0, 1), it follows that be the expression inside the braces in (3.5). Since by Assumption 2 individual agents view the next period's state z as exogenous, V U (z ) does not depend on a and f is strictly concave. Furthermore, Hence considering the cases f (ā) ≷ 0 and f (1) ≷ 0, the optimal action is given by (3.8). Proof of Corollary 3.3. Since 0 ≥ V U > V I k by (3.6) and e −ν∆ ≤ 1, we always have E z e −ν∆ V U (z ) − V I k > 0. By (3.6) and (3.8), the best response can be rewritten as a * = φ(x), where Since r > 0, u D < u I ≤ 0, and V U ≤ 0, it is straightforward to show that x is decreasing in u I , u D and increasing in ν, µ(z), p(z), δ. Therefore the desired comparative statics results hold because φ is decreasing. Proof of Corollary 3.4. Let x in (A.4) be the argument of φ in (3.8). Using (2.5), (3.2), and the bound V U ≤ 0 in (3.6), we can bound x from above as Using µ(z) ≤ 1, a U (z) ≤ 1, and p(z) = β∆(a I I k + a U (z)I u ) ≤ β∆(I k + I u ) = β∆I by (3.3), it follows from (A.5) and V I k < 0 that Therefore if I ≤Ī, whereĪ is as in (3.9), it follows from (A.6) that x ≤ u (1). Therefore a * = φ(x) = 1 by (3.7). To prove Theorem 3.6, we consider a general stochastic game with complete information and finitely many competitive agents denoted by n ∈ N = {1, . . . , N }. Suppose that there are finitely many agent types indexed by h ∈ H = {1, . . . , H}. At each point in time, the aggregate state of the economy is denoted by z ∈ Z, which is a finite set. A type h agent takes action x ∈ A h . Let A = H h=1 A h . Let u h (x, a, z) be the flow utility of a type h agent when the agent takes action x ∈ A h , the average action of other agents is a ∈ A, and the aggregate state is z ∈ Z. Type h agents discount future utility with discount factor ρ h ∈ [0, 1). Time is denoted by t = 0, 1, . . . . The aggregate state z evolves stochastically depending on agents' average actions and the current state. Let q(a, z, z ) be the probability of z t+1 = z conditional on (a t , z t ) = (a, z) ∈ A × Z. When computing q(a, z, z ), agents take the average action a as given and ignore the impact of their action x on a (competitive behavior). An agent's type evolves stochastically depending on an agent's own action, the other agents' average actions, and the current state. Let p hh (x, a, z) be the probability that a type h agent switches to type h next period when the state is z ∈ Z, he takes action x ∈ A h , the average action is a ∈ A, and the current state is z ∈ Z. Value functions are denoted by V = (V 1 , . . . , V H ), where V h : Z → R. Reaction functions are denoted by θ = (θ 1 , . . . , θ H Definition A.2. A (pure strategy) Markov competitive equilibrium is a pair (V, θ) of value and reaction functions such that (i) (Consistency) The transition probability of the aggregate state is consistent with individual behavior, so Pr(z | z) = q(θ(z), z, z ). (ii) (Sequential rationality) For each h ∈ H and z ∈ Z, the Bellman equation (A.7) holds, and x = θ h (z) achieves the maximum. Theorem A.3. Suppose the following assumptions hold: (i) For each h ∈ H, the action set A h is a nonempty compact convex subset of some Euclidean space. (ii) For each h ∈ H, the payoff function u h : A h × A × Z → R is continuous in (x, a, z) and strictly concave in x. (iii) The transition probability q : A × Z × Z → [0, 1] is continuous. (iv) For each h, h ∈ H, the transition probability p hh : A h × A × Z → [0, 1] is continuous in (x, a, z) and affine in x. Then there exists a pure strategy Markov competitive equilibrium. To prove Theorem A.3, we need the following lemma. Lemma A.4. Let (X, d) be a complete metric space and Θ be a topological space. Endow X × Θ with the product topology. Suppose T : X × Θ → X is continuous and there exists β ∈ [0, 1) such that d(T (x, θ), T (y, θ)) ≤ βd(x, y) (A.8) for all x, y ∈ X and all θ ∈ Θ. Then the following statements are true: (i) For each θ ∈ Θ, there exists a unique x * (θ) ∈ X such that T (x * (θ), θ) = x * (θ). (ii) x * : Θ → X is continuous. Proof. The first claim is immediate from the contraction mapping theorem. To show the second claim, for simplicity write T θ x := T (x, θ). Fix any (x 0 , θ) ∈ X × Θ and define x n = T n θ x 0 for n ∈ N. Then by the triangle inequality and the contraction property (A.8), we have Letting n → ∞ and noting that x n → x * (θ) by the contraction mapping theorem, we have Since x 0 ∈ X and θ ∈ Θ are arbitrary in (A.9), set θ = θ and x 0 = x * (θ), where θ, θ ∈ Θ are arbitrary. Since by definition T θ x 0 = x 0 , it follows that where x 0 = x * (θ). Since T is continuous and x 0 = x * (θ) depends only on θ, for any > 0 there exists an open neighborhood U of θ such that whenever θ ∈ U . Combining (A.10) and (A.11), for all θ ∈ U we have d(x * (θ ), x * (θ)) < , so x * : Θ → X is continuous. A straightforward application of the maximum theorem implies that (T θ V ) h (z) is continuous in θ. Since H is a finite set, we have ρ := max h ρ h ∈ [0, 1). It is then straightforward to verify Blackwell (1965) 's sufficient conditions, and for fixed θ ∈ A Z , the map V V → T θ V is a contraction mapping with modulus ρ. It follows from Lemma A.4 that T θ has a unique fixed point which is continuous in θ. Since by assumption u h (x, a, z) is continuous and strictly concave in x and p hh (x, a, z) is affine in x, the objective function inside the braces of (A.12) (where V h (z) = V * h (z, θ)) is continuous and strictly concave in x. Therefore there exists a unique maximizer, which we call x = a * h (z, θ). By the maximum theorem, a * h (z, θ) is continuous in θ. Since H, Z are finite sets, we may view a * : A Z → A Z as a continuous map. Since A is nonempty, compact, and convex, by Brouwer's fixed point theorem, a * has a fixed point. Letting θ = (θ h (z)) be this fixed point, it is clear that all conditions in Definition A.2 are satisfied. Proof of Theorem 3.6. We apply Theorem A.3. The agent type is denoted by h ∈ H = {U, I k , R k , D}, which is finite. The aggregate state is denoted by z ∈ Z defined in (2.1), which is finite. Define the action set of type h agents by A h = [ā, 1] if h = D and A D = {0}, which are nonempty compact convex subsets of R. Define the utility function u h : Note that each u h is continuous in (x, a, z) and strictly concave in x ∈ A h . (Although u D is a constant function, it is strictly concave in x because its domain A D = {0} is a singleton.) Define the transition probabilities of individual states p hh (x, a, z) as follows. For h = R k , D agents, because they remain in their corresponding state forever, the transition probabilities are 0 or 1, which are clearly continuous in all variables and affine in x. For I k agents, by assumption they die with probability γ∆δ. Hence p I k D (x, a, z) = γ∆δ, which is continuous in all variables and constant (hence affine) in x. The same is true for p Ih for any h ∈ H. By (2.2) and (3.5), the transition probabilities of U agents are which are continuous in all variables and affine in x. Finally, the transition probability for the aggregate state q : A × Z × Z → [0, 1] is clearly continuous because it is determined by the current state z ∈ Z (which is finite) and individual's actions, whose transition probabilities are all continuous. The existence of equilibrium in the sense of Definition A.2 then follows from Theorem A.3. The resulting value and policy functions clearly satisfy Definition 3.5. Because U agents take identical actions, their beliefs always satisfy (2.6). We need the following lemma to prove Theorem 3.9. Lemma A.5. Let I ⊂ R be an interval, f : I → R continuously differentiable with f = 0 on I, and g : I → R continuous. If f (a) = g(b) for some a, b ∈ I, Proof. By the mean value theorem, there exists θ ∈ (0, 1) such that where c = (1 − θ)a + θb. Since by assumption f (a) = g(b), we obtain where m = min x∈ [a,b] |f (x)| > 0 since f is continuous and f = 0 on I. Dividing both sides by m > 0, we obtain the desired result. Proof of Theorem 3.9. Fix z ∈ Z and let µ = µ(z) and a * U = a * U (z). Define the functions f, g : A → R by where (with a slight abuse of notation) p(x) := β∆(a I I k + xI u ). Taking the derivatives of f, g, we obtain where the last two inequalities use u < 0 and e −ν∆ V U (z ) ≥ V U (z ) > V I k by Proposition 3.1. Since A = [ā, 1] is nonempty, compact, convex, and f, g are continuous and strictly concave, they achieve unique maxima. By Definition 3.5, the maximum of f equals a * U . Since the last term of (3.14) does not depend on a * U , by Definition 3.7 the maximum of g equals a † U = a † U (z). Let us first show a * U − a † U ≥ 0. Note that at x = a * U , we have Since g is strictly concave and a † U achieves its maximum, it must be a † U ≤ a * U . If f (a * U ) > 0, since f is strictly concave, it must be a * U = 1. Therefore a † U ≤ 1 = a * U . In either case, we have a * U − a † U ≥ 0. Next, let us show the upper bound in (3.17). If f (a * U ) < 0, since f is strictly concave, it must be a * U = ā. Then ā ≤ a † U ≤ a * U = ā, so a * U − a † U = 0 and the bound (3.17) is trivial. Similarly, if g (a † U ) > 0, since g is strictly concave, it must be a † U = 1. Then 1 = a † U ≤ a * U ≤ 1, so a * U (z) − a † U (z) = 0 and the bound (3.17) is trivial. Therefore without loss of generality we may assume f (a * U ) ≥ 0 ≥ g (a † U ). If f (a * U ) > 0 > g (a † U ), then 1 = a * U ≥ a † U = ā and the bound (3.17) is trivial. Therefore we may assume either f (a * U ) ≥ 0 = g (a † U ) or f (a * U ) = 0 ≥ g (a † U ). In Section 2 we supposed that an infected agent is informed of the infection with probability σ and that a known (unknown) infected agent dies with probability δ (0). In this Appendix we elaborate upon this assumption and show that it can model the case in which there are multiple signals. Suppose now that there are multiple signals that may be received upon infection, indexed j = 1, . . . , J. Let σ j > 0 be the probability of receiving signal j upon infection, with σ := J j=1 σ j ∈ (0, 1]. Let δ j ∈ [0, 1] be the fatality rate conditional on receiving signal j and δ := ( J j=1 σ j δ j )/σ be the expected fatality conditional on receiving any signal. For instance, the signal could encode the result of a laboratory test as well as the type and severity of symptoms. Suppose a type j known infected agent takes action a j with associated flow utility u j . Then by (3.2), the value function of a type j agent is Therefore the expected continuation value of being known infected is where u I := ( J j=1 σ j u j )/σ is the expected flow utility of being known infected. Because (B.1) is identical to (3.2) and the continuation values {V j } affect the behavior of unknown agents only through V I k due to expected utility maximization, a model with multiple signals and heterogeneous fatality is observationally equivalent to a model with a single signal and uniform fatality. The flow utility u I and action a I in Section 2 can be interpreted as the average across those of known infected agents. We now consider an extension in which we allow the beliefs of the agents to enter as a separate state variable, the evolution of which depends directly upon their actions. The evolution of the aggregate state variables and the value functions of the known infected, recovered, and dead agents are unchanged relative to the perfect Bayesian equilibrium, and we must therefore only alter the problem of the unknown agents. Now (2.6) will remain their belief in equilibrium, but not if they deviated from the equilibrium action in the past. If the average action of unknown agents isã, an agent with belief µ who chooses activity a will believe that if susceptible, they become infected at rate βa((1 − σ)ã + σa I )I. Their belief that they are susceptible is then the probability they were susceptible in the previous period multiplied by the probability they were not infected without diagnosis during the last period, or (C.1) We then define a perfect recall Markov equilibrium as an allocation in which the unknown agents optimize given the law of motion of the population shares and their beliefs, and these beliefs are consistent with equilibrium behavior and the Bayes rule. Formally, we adopt the continuous-time formulation used in the numerical analysis and proceed as follows. Rearranging (C.1) and sending ∆ → 0 gives the evolution of beliefṡ The Hamilton-Jacobi-Bellman for unknown agents is Given an average actionã of unknown agents, there is an associated policy function a(S, I, µ;ã) solving the problem of a given unknown agent. Now define an operator J(ã)(S, I) = a(S, I, S/[1 − σ + σS];ã) −ã. The equilibrium notion we adopt in this section is then that of a Markov perfect equilibrium, in which all agents solve their individual problems taking the aggregate law of motion as given, and the associated law of motion is consistent with individual behavior. Definition C.1. A Perfect recall Markov equilibrium consists of value functions V U (S, I, µ) and V I k (S, I) for unknown agents and known infected agents together with a policy function a(S, I, µ) for unknown agents such that: • The functions V U (S, I, µ), a(S, I, µ) and V I k (S, I) solve the problems of the unknown agents and known infected agents, respectively. • The law of motion of the aggregate state is consistent with the policy function of the unknown agents, or J(ã) = 0. As Figure 8 shows, in the perfect recall Markov equilibrium agents take higher actions everywhere in the state space and the length of the pandemic is reduced. Our model with (large but) finitely many competitive agents is not computationally tractable because the state space Z in (2.1) is very large. 13 To make the model computationally tractable, we make two approximations. First, we suppose that the number of agents N is large and apply the strong law of large numbers, which makes the transitions of the aggregate state deterministic as in (3.19) . Under this approximation, we have I k = σI and I u = (1 − σ)I. Since R k + R u + D = 1 − S − I and an infected agent becomes symptomatic with probability σ, we have R k + D = σ(1 − S − I) and R u = (1 − σ)(1 − S − I). Therefore the belief µ(z) in (2.6) becomes 13 To be specific, since there are N agents and 6 agent types, there are N +5 5 combinations of aggregate states. Even if we note that the distinction between R k and D agents is payoff irrelevant and combine them into one type, and N is small such as N = 100, the number of possible states 104 4 = 4,598,126 is very large. With more realistic N , which is of the order over a million, the number of possible combinations is astronomical. Because the behavior of R k , R u , D agents does not affect state transitions, with a slight abuse of notation we can define the minimal state space by Second, to aid in numerical accuracy, we produce the figures in the main text by considering the continuous-time limit of our model and applying the finitestate Markov chain method of Kushner and Dupuis (1992) . The existence of an approximate equilibrium can be established by arguments that are essentially identical to those in Theorem 3.6. When solving for an SIR model numerically, because the fraction of infected agents I varies by many orders of magnitude (say between 10 −6 and 10 −1 ), it is important to use a grid that properly covers the relevant range of the state space, such as an exponential grid. In general, suppose we would like to construct an Npoint exponential grid on a given interval (a, b), where the lower bound a is not necessarily positive. A natural idea to deal with such a case is as follows. Constructing exponential grid. (i) Choose a shift parameter s > −a. (ii) Construct an N -point evenly-spaced grid on (log(a + s), log(b + s)). (iii) Take the exponential and subtract s. The remaining question is how to choose the shift parameter s. Suppose we would like to specify the median grid point as c ∈ (a, b). Since the median of the evenly-spaced grid on (log(a + s), log(b + s)) is 1 2 (log(a + s) + log(b + s)), we need to take s > −a such that Note that in this case s + a = c 2 −ab a+b−2c + a = (c−a) 2 a+b−2c , so s + a is positive if and only if c < a+b 2 . Therefore, for any c ∈ a, a+b 2 , it is possible to construct an exponentially-spaced grid with end points (a, b) and median point c. To solve the model in Section 4 numerically, we construct finite grids for S, I and define the minimal state space Z in (D.1) by the Cartesian product of the S, I grids. For the S-space (fraction of susceptible agents), we use a 100-point uniform grid on [10 −8 , 1]. For the I-space we use a 400-point exponential grid on [10 −8 , 1] with a median point of 10 −4 . We now write these grids as Σ S = {0, 1/N S , . . . , 1 − 1/N S , 1} and Σ I = {I 0 , I 1 , . . . , I N I }, and define We consider the numerical method for computing the perfect Bayesian equilibrium. Let V h (S, I) be the value function of type h = U, I k agents,ã(S, I) be the policy function of unknown agents, and let partial derivatives be denoted by ∂ S etc. In this case the Hamilton-Jacobi-Bellman equation for the unknown agent becomes Omitting terms independent of a U , dividing by r∆ t and sending ∆ t → 0, the maximization problem becomes Note that the linear system characterizing the value function of the unknown agents is 0 =ru(a U ) +p uk V I k (S, I) +p −I V U (S, I − ∆ − I ) +p +I V U (S, I + ∆ + I ) +p −S V U (S − ∆ S , I + ∆ S ) + r + ν +p −S +p −I +p +I +p uk V U (S, I). (D.7) We then iterate upon the policy function a U . Beginning with an arbitrary guess a U , we compute the value function of the unknown agent by solving the linear system (D.7), replace a U with the implied policy function a U in (D.6), and repeat until convergence. To ensure the chain remains on the grid we declare that at I = 1 we have a U ≤ a 1 (S), the point at which 0 = βSa U ((1 − σ)a U + σa * I k ) − γ. For S > 0 we havê a 1 (S) = 1 2βS(1 − σ) −βSσa * I k + [βSσa * I k ] 2 + 4γβS(1 − σ) . We then have the Bellman equation Omitting terms independent of the controls, using (D.10) and sending ∆ t → 0 gives u(a U ) + ca U (1 − σ)a U + σa * I k , (D.12) where c := βSI C BS − C F I /[r(1 − σ + σS)]. If c ≥ 0 then a U = 1. Otherwise the objective function is concave, and we only need to evaluate the first-order conditions. The first-order condition condition for (D.12) when u is CRRA is 0 = a −α U + c(σa * I k + 2(1 − σ)a U ). To avoid overflow in the code we divide all quantities by ∆ t and consider the limit of the above as ∆ t → 0. The linear system we solve at each stage is 0 = − r (1 − σ + σS)u(a U ) + σIu(a * I k ) − γδσIu D + νC vac I +p −S C(S − ∆ S , I) +p +I C(S, I + ∆ + I ) +p −I C(S, I − ∆ − I ) − (r + ν +p −S +p +I +p −I )C, (D.14) where for each direction we havep = p/∆ t . We then iterate upon the policy function a U . Beginning with an arbitrary guess a U , we compute the planner's objective function by solving the linear system (D.14), replace a U with the implied policy function a U in Lemma D.1, and repeat until convergence. We consider the numerical method for computing the perfect recall Markov equilibrium. We define a grid for beliefs Σ µ = {0, 1/N µ , . . . , 1 − 1/N µ , 1} for some integer N µ ≥ 1. Let Σ := Σ S × Σ I × Σ µ . For unknown agents, we must specify the transition probabilities for their beliefs and the probability with which they become known infected. The latter quantity is simply (D.4). We suppose that at an arbitrary (S, I, µ) ∈ Σ, there are four possible transitions, to (S − ∆ S , I, µ), (S, I − ∆ − I , µ), (S, I + ∆ + I , µ) and (S, I, µ − ∆ µ ), with associated probabilities p −S , p −I , p +I and p −µ . The local consistency requirements for the aggregate state are again satisfied if we choose p −S , p −I and p +I according to (D.3). Using (C.2), the local consistency requirement for the belief variable is −∆ µ p −µ = −∆ t (1 − σ)µβa U (1 − σ)ã(S, I) + σa * I k I + o(∆ t ), which will be satisfied if we choose a U , we compute the value function of the unknown agent by solving the linear system (D.18), replace a U with the implied policy function a U in (D.17), and repeat until convergence. We calibrate the flow utility from death u D based on the case study from Sweden, which did not introduce mandatory lockdowns. For this purpose, we first obtain the daily cumulative number of reported cases and deaths for Sweden from Johns Hopkins University CSSE (Footnote 12). Let N C,t , N D,t be the cases and deaths up to date t. We divide these numbers by the population N = 10.38×10 6 to obtain the population shares of reported cases C t := N C,t /N and deaths D t := N D,t /N . We compute the case fatality rate as δ CFR,t := D t /C t . Figure 9a shows the evolution of the case fatality rate. The fact that the CFR peaked at around 12% in May 2020 and has settled down below 2% at the time of writing suggests that the reporting rate has significantly changed over time and that the reported cases are unreliable. We thus estimate the fraction of infected population I t using the accounting equation from the SIR model To control for the day-of-week effect (deaths seem to be unreported over the weekend), we take the 7-day moving average of (E.1), which is plotted in Figure 9b . The estimated peak prevalence is thus max t I t = 0.0663. Finally, we choose the value for u D to match the peak prevalence in the model and obtain u D = −12.22. (a) Case Fatality Rate (%) over time. (b) Estimated prevalence over time. As a robustness check, we also compute u D from the value of statistical life. Consider an individual consuming a constant flow normalized to 1 and facing a small probability d of death for one period. Letting β ∈ (0, 1) be the agent's discount factor, p be the willingness to pay to avoid the possibility of death, V be the continuation value of being alive, and V D be the continuation value of being dead, by definition we have which is very similar to the value we obtained from the case study from Sweden. Optimal management of a pandemic in the short run and the long run A simple planning problem for COVID-19 lockdown, testing, and tracing Choices, beliefs, and infectious disease dynamics Discounted dynamic programming Maximize utility subject to R ≤ 1: A simple price-theory approach to Covid-19 lockdown and reopening policy. NBER Working Paper 28093 A mathematical analysis of public avoidance behavior during epidemics using game theory Public avoidance and epidemics: Insights from an economic model Systematic biases in disease forecasting-the role of behavior change Internal and external effects of social distancing in a pandemic Economic considerations for social distancing and behavioral based policies during an epidemic Adaptive human behavior in epidemiological models Rational epidemics and their public control Trading off consumption and COVID-19 deaths. Federal Reserve Bank of Minneapolis Quarterly Review Temporal dynamics in viral shedding and transmissibility of COVID-19 Infection fatality rate of COVID-19 inferred from seroprevalence data A contribution to the mathematical theory of epidemics Integrating behavioral choice into epidemiological models of AIDS Optimal control of an epidemic through social distancing Numerical Methods for Stochastic Control Problems in Continuous Time Estimating the asymptomatic proportion of Coronavirus Disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimates of serial interval for COVID-19: A systematic review and meta-analysis Game theory of social distancing in response to an epidemic Optimal quarantine programmes for controlling an epidemic spread Rational disinhibition and externalities in prevention Case 1: f (a * U ) ≥ 0 = g (a † U ). Extrapolate f, g for a > 1 asClearly f 1 , g 1 agree with f, g on A = [ā, 1] . Furthermore, f 1 is twice continuously differentiable and strictly concave on [ā, ∞); g 1 is continuously differentiable and strictly concave. Since f 1 (a * U ) ≥ 0 > −∞ = f 1 (∞), f 1 achieves a unique maximum at some a 1 ∈ [a * U , ∞) satisfying f 1 (a 1 ) = 0. Since g (a † U ) = 0, g 1 achieves a unique maximum at b 1 := a † U and g 1 (b 1 ) = 0. ThereforeNoting that f 1 (x) = f (1) for x > 1 and m := min a∈A |u (a)| > 0, we obtainBy the definitions of f 1 , g 1 , we haveTaking the absolute value of both sides and setting x = b 1 = a † U ∈ [ā, 1], we obtain. Noting that f 1 (a 1 ) = g 1 (b 1 ) = 0, combining (A.13), (A.14), (A.15), and applying Lemma A.5 to (f, g) = (f 1 , g 1 ) and (a, b) = (a 1 , b 1 ), we obtainwhich implies the upper bound in (3.17).. Furthermore, f 2 is continuously differentiable and strictly concave on (−∞, 1]; g 2 is twice continuously differentiable and strictly concave.By the same argument as in the previous case, we can deriveNoting that g 2 (a 2 ) = f 2 (b 2 ) = 0, combining (A.16), (A.17), (A.18), and applying Lemma A.5 to (f, g) = (g 2 , f 2 ) and (a, b) = (a 2 , b 2 ), we obtainwhich implies the upper bound in (3.17).If σ = 1, since I u = 0, we have a † U (z) = a * U (z) by (3.17). We consider the numerical method for computing the efficient action. Taking the limit as ∆ → 0 in the main text, one can show that the value function of the planner is of the form W (S, I, D) = Du D − C(S, I), where C solves the Hamilton-Jacobi-Bellman equationis interpreted as the cost of the pandemic (in terms of utility) when the vaccine has arrived. We first construct a locally consistent Markov chain for the law of motion of the state variables. The local consistency requirements are given byFor an arbitrary (S, I) ∈ Σ there are three possible transitions, to (S−∆ S , I), (S, I− ∆ − I ) and (S, I + ∆ + I ), with associated probabilities p −S , p −I and p +I . The local consistency requirements (D.9) becomeInspection reveals it will suffice to setwhere p uk is given by (D.4) . The Bellman equation for unknown agents is thenOmitting terms independent of a U , dividing by r∆ t and sending ∆ t → 0, the maximization problem becomes(D.16) When the utility function takes the form (4.1), the first-order condition rearranges to a U = a FOC U := [−c] −1/α and the optimal choice of a U is therefore a U = 1 c≥0 + (1 − 1 c≥0 ) max ā, min 1, a FOC U .(D.17)Note that the linear system characterizing the value function of the unknown agents is 0 =ru(a U ) + ∆ µ σ 1 − σp −µ V I k (S, I) +p −µ V U (S, I, µ − ∆ µ ) +p −I V U (S, I − ∆ − I , µ) +p +I V U (S, I + ∆ + I , µ) +p −S V U (S − ∆ S , I + ∆ S , µ) + r + ν +p −S +p −I +p +I +p −µ + ∆ µ σ 1 − σp −µ V U (S, I, µ). We then iterate upon the policy function a U . Beginning with an arbitrary guess