key: cord-186927-b8i85vo7 authors: Hubert, Emma; Mastrolia, Thibaut; Possamai, Dylan; Warin, Xavier title: Incentives, lockdown, and testing: from Thucydides's analysis to the COVID-19 pandemic date: 2020-09-01 journal: nan DOI: nan sha: doc_id: 186927 cord_uid: b8i85vo7 We consider the control of the COVID-19 pandemic via incentives, through either stochastic SIS or SIR compartmental models. When the epidemic is ongoing, the population can reduce interactions between individuals in order to decrease the rate of transmission of the disease, and thus limit the epidemic. However, this effort comes at a cost for the population. Therefore, the government can put into place incentive policies to encourage the lockdown of the population. In addition, the government may also implement a testing policy in order to know more precisely the spread of the epidemic within the country, and to isolate infected individuals. We provide numerical examples, as well as an extension to a stochastic SEIR compartmental model to account for the relatively long latency period of the COVID-19 disease. The numerical results confirm the relevance of a tax and testing policy to improve the control of an epidemic. More precisely, if a tax policy is put into place, even in the absence of a specific testing policy, the population is encouraged to significantly reduce its interactions, thus limiting the spread of the disease. If the government also adjusts its testing policy, less effort is required on the population side, so individuals can interact almost as usual, and the epidemic is largely contained by the targeted isolation of positively-tested individuals. Starting around 430 BC, and known as the first historically well-documented epidemic, the Plague of Athens killed between a quarter and a third of Athenians, as reported by Thucydides. He described the reaction of common Athenians and physicians of the time alike in these terms For a while physicians, in ignorance of the nature of the disease, sought to apply remedies; but it was in vain, and they themselves were among the first victims, because they oftenest came into contact with it. No human art was of any avail, and as to supplications in temples, inquiries of oracles, and the like, they were utterly useless, and at last men were overpowered by the calamity and gave them all up. (Jowett [60, Volume I, Book II, pp. 135]) Thucydides analysed the consequences of this epidemic, and concluded that it had led a moral upheaval for the Athenians, faced with the complete lack of any useful cure. Indeed, they realised that their traditionally used policies (mostly of a religious nature) to face tragedies had no effect on the epidemic, and that in the end, the disease was only stopped thanks to the development of a natural immunity within the population, during the first four years of the epidemic phase. Concerning now more specifically the spread of the disease itself, Thucydides wrote the following Appalling too was the rapidity with which men caught the infection; dying like sheep if they attended on one another; and this was the principal cause of mortality. When they were afraid to visit one another, the sufferers died in their solitude, so that many houses were empty because there had been no one left to take care of the sick; or if they ventured they perished, especially those who aspired to heroism. For they went to see their friends without thought of themselves and were ashamed to leave them, at a time when the very relations of the dying were at last growing weary and ceased even to make lamentations, overwhelmed by the vastness of the calamity. (Jowett [60, Volume I, Book II, pp. 138]) In Thucydides's analysis of the Plague of Athens, we can isolate three fundamental questions that need to be addressed whenever an unknown epidemic occurs. (1) How can one model a disease when one has, at best, parsimonious information on how it is spreading among the population? (2) How can one solve the Gordian knot associated to interactions within the population: enjoying on the one hand the presence of others and avoiding isolation and solitude, and on the other hand potentially dramatically spreading the disease? (3) How can governments and decisions-makers incentivise people in order to better control the spread of the epidemic? The first question is naturally linked to several strands of fundamental research, both for mathematicians and physicians, dealing with the problem of choosing a relevant epidemic model. If the paternity of the first mathematical model designed to describe the evolution of an epidemic is often attributed to Bernoulli, who proposed one for smallpox as early as 1760 in [17] , the real mathematical development of the theory had to wait for the 20th century, with fundamental contributions for the development of deterministic models by Hamer [52] , Ross [88; 89; 90] , Soper [98] , and later Kermack and McKendrick [64] , McKendrick [75] , and Bartlett [13] who proposed one of the first general investigations of the evolution of deterministic interacting systems, which was then applied to epidemiology in Kendall [63] . The previous list is by no means comprehensive, and we refer the interested reader to the monograph by Bailey [12] for more historical details. It was rapidly noticed that deterministic models were insufficient to account for the uncertainty associated the disease spreading, and the technical difficulties usually encountered for its detection. This acknowledgement helped nurturing the development of stochastic models, whose first instances seems to be traced back to McKendrick [74] and Greenwood [48] . For a precise comparison between deterministic and stochastic models in discrete-time settings, we refer our readers to Bailey [12] , Bartlett [14] , and Allen and Burgin [5] , and to Allen [4] for more up-to-date references and an overview of recent epidemiological models. We will now describe some specific type of epidemiological models, belonging to the general class of compartmental models, and which will be at the heart of our work. The first one considers a sort of worst-case scenario, in which an immunity is not developed after infection. This is specially relevant for instance for some sexually transmitted infections, or bacterial diseases. In such a case, infected people can either die of the infection, or be cured and therefore become once more susceptible to contract the disease. Such models have been coined SIS (for Susceptible-Infected-Susceptible), and consider a population divided into two groups. Susceptible individuals interact with infected ones, and therefore move from one class to the other repeatedly. This model was first discussed in Weiss and Dishon [106] , generalising a simpler version by Bailey [11] , where it was linked to birth and death interacting processes. It was then further studied by Kryscio and Lefèvre [67] , who computed the mean time of extinction of the infection. These discrete-time models were then extended by Nåsell [78; 79] , who found the quasi-stationary distribution of a continuous-time stochastic SIS model with no births nor deaths. More recently, Gray, Greenhalgh, Hu, Mao, and Pan [47] proposed to model a stochastic SIS process in continuous-time, as a solution to a bi-dimensional SDE driven by a Brownian motion. This is the model we will follow in our SIS framework. Alternatively to this quite pessimistic scenario, one can also assume that an immunity will appear after infection. In that case, we can distinguish three classes: susceptible individuals who can contract the disease, infected people who are currently infected by the disease, and recovered people who have been cured and developed antibodies. Introduced originally by Kermack and McKendrick [64] , this so-called SIR model was studied in depth by Anderson and May [8] in a deterministic setting, while stochastic perturbations were introduced by Beretta, Kolmanovskii, and Shaikhet [16] . Modelling a stochastic SIR process as a solution to an SDE driven by a Brownian motion was then proposed in Tornatore, Buccellato, and Vetro [103] , and Jiang, Yu, Ji, and Shi [59] . This will be our model choice in this case. For a more realistic modelling in the case of the COVID-19 disease, and especially to account for the relatively long latent phase of this disease, one could also assume that once a susceptible individual contracts the disease, he does not immediately become contagious. This led us to provide some extensions of our reasoning, in particular to a SEIR model, used by Bacaër [10] , Dolbeault and Turinici [32] and Élie, Hubert, and Turinici [38] to model the COVID-19 disease. In this type of models, an intermediary class between susceptible and infected is introduced, usually referred to as the class of exposed individuals. This class allows to model individual infected but not yet infectious. Similarly as for the SIR/SIS models, another variation on this model considers that there is only a partial immunity, and individuals having recovered may revert to the class of susceptible: in this case, the model is usually coined SEIRS. In our framework we need to consider continuous-time stochastic version of these models, and will therefore use the ones introduced in Mummert and Otunuga [77] . The question of the model being now settled, we can focus more on the second question raised above, which is linked to the spread of the disease through interactions within the population. In classical SIS/SIR models, the infection grows into the population through an incidence rate β, and proportionally to the product of the number of susceptible and infected individuals. In the absence of a cure or a vaccine, this transmission rate appears therefore as the only control variable of individual or public institutions, in order to reduce the spread of an epidemic. Our take on the second question will therefore be from a control-theoretic perspective. At the heart of this approach is the simple idea that when faced with an epidemic, a perfectly rational population will try to find an equilibrium interaction rate, balancing the need to still connect with others, and the natural fear of spreading the infection itself. This is by no means a new point of view, and papers discussing the use of formal control theory in epidemiology can be dated back to the 70s, see among others Taylor [100] , Jaquette [58] , Sanders [92] , Gupta and Rink [50; 51] , Abakuks [1] , Morton and Wickwire [76] , Wickwire [107] , or Sethi and Staats [97] . More recently and closer to our purpose, we can also refer to Behncke [15] , Riley et al. [87] , who studied the impact of the control of transmission rate on the 2002-2004 SARS outbreak in Hong Kong and on the ways to interfere with the disease spreading, Piunovskiy and Clancy [84] , Hansen and Day [53] , Fenichel et al. [39] , Kandhway and Kuri [61] , Sélley, Besenyei, Kiss, and Simon [96] , and more broadly to the monograph by Lenhart and Workman [69] . An important, and slightly unrealistic aspect of the framework we just described is that the population is perfectly rational. Though it seems reasonable to assume that at least some individuals, being afraid of getting sick, will naturally decrease their interaction rates, it would however clearly be a stretch to assume that all individuals will have access to enough information, compared for instance to public institutions, for them to assess whether they are really acting in a way which is truly beneficial to the population as a whole. This is one of the reasons why quarantine and lockdown measures can be in addition introduced by governments, in order to help slow down a pandemic, when no cure nor vaccine have been developed, and there is a risk for medical facilities to be overwhelmed by a large influx of patients. As should be expected, a significant part of the recent literature on the COVID-19 pandemic has also adopted this point of view, and such measures as well as their medical, societal, and economical impacts are discussed by, among others, Alvarez, Argente, and Lippi [6] , Anderson, Heesterbeek, Klinkenberg, and Hollingsworth [9] , Colbourn [24] , Del Rio and Malani [30] , Djidjou-Demasse, Michalakis, Choisy, Sofonea, and Alizon [31] , Élie, Hubert, and Turinici [38] , Ferguson et al. [40] , Fowler, Hill, Levin, and Obradovich [41] , Grigorieva, Khailov, and Korobeinikov [49] , Hatchimonji, Swendiman, and Seamon [54] , Kantner [62] , Ketcheson [65] , Piguillem and Shi [83] , Thunstrom, Newbold, Finnoff, Ashworth, and Shogren [101] , Toda [102] , or Wilder-Smith, Chiew, and Lee [108] . A telling example in the above list is the report of the Imperial College London by Ferguson et al. [40] , which assesses the impact of non-pharmaceutical interventions to reduce the contact rate within a population for the COVID-19 pandemic. They distinguish between mitigation strategies (i.e. reduction of the peak hospitalisation levels by protecting the most susceptible individual from getting infected, with shelter in place policies or social distancing), and suppression strategies (i.e. aiming at reversing the disease growth with home isolation and social distancing for the entire population). It has been showed that mitigation policies 'might reduce deaths seen in the epidemic by up to half, and peak healthcare demand by two-thirds,' (Ferguson et al. [40, pp. 15] ) but will lead to numerous deaths and saturation of health systems. The suppression strategy thus appears in this report as a preferred policy. In light of the issues we have raised, a natural conclusion was, at least for us, that even if a control-theoretic approach to mitigate the impact of an epidemic is clearly desirable, there is a priori no evidence that in face of clear public policies, a population will directly adopt a social distancing behaviour leading to an optimal transmission rate for the welfare of the society. Moreover, in the absence of a system allowing to actually keep track of the level of interaction within the population, governments are faced with a clear situation of moral hazard. 1 Consequently, an incentive policy should also be calibrated by governments in order to get a better control on the spread of the disease. This, as expected, leads us to our third question, which is where our approach departs significantly from the extant literature. The COVID-19 pandemic has emphasised that a control policy has to be established with penalties, if lockdown measures are not respected by the population. However, such policy is subject to two mains issues. First, regardless of the amount of police checks being put into place, it is impossible for large countries to ensure the application of such isolation measures, and therefore it is unfeasible to have an absolute control on the behaviour of all individuals and their interactions. Second, a balance has to be stricken between the severity of penalties or other type of incentives to help reduce the propagation of the disease, and the natural yearning of citizens for interactions. To the best of our knowledge, no real calibration, founded on quantitative criteria, of appropriate incentive policies has been investigated in epidemiological models. 2 The present paper proposes to fulfil this gap by studying how a lockdown policy, seen as a suppression strategy to echo [40] , can limit the number of infected people during an epidemic, with uncertainties on the actual number of affected individuals, and on their level of adherence to such a policy. More specially, we aim at solving this moral hazard problem by finding (i) the best reaction effort of the population to reduce the interaction given a specific government policy; (ii) the optimal policy composed by an aggregated tax paid by the population at some fixed maturity, and a testing policy to reduce the uncertainty on the estimated number of infected people. As we already mentioned, this problem perfectly fits with a classical principal-agent problem with moral hazard, and boils down to finding a Stackelberg equilibrium between the principal (the leader, here the government) proposing a policy to an agent (the follower, here the population) to interact optimally in order to reduce the spread of the disease. Principal-agent problems have a long history in the economics literature, dating back from, at least, the 60s. It is not our goal here to review the whole literature on the subject, and we refer the interested reader to the seminal books by Laffont and Martimort [68] , Bolton and Dewatripont [19] , or Salanié [91] . For our purpose here, we will content ourselves to mention that this literature regained a strong momentum in the past two decades, where continuoustime models where developed and showed to be more flexible and tractable than the earlier static or discrete-time models. Main contributors in these regards are Holmström and Milgrom, [55] , Schättler and Sung [95] , Sannikov [93] , Williams [109] , see also the monograph by Cvitanić and Zhang [27] . More recently, Cvitanić, Possamaï, and Touzi [28; 29] developed a general theory allowing to tackle a great number of contract-theory problem, which has been then extended and applied in many different situations 3 . The basic idea is to identify a sub-class of contracts offered by the principal, which are revealing in the sense that the best-reaction function of the agent, and his optimal control, can be computed straightforwardly, and then proving that restricting one's attention to this class is without loss of generality. With this approach, the problem faced by the principal now becomes a standard optimal control problem. There are however two fundamental assumptions for this theory to work, one of them being a specific structure condition, which enforces that the drift of the process controlled by the agent, meaning here for us the pair (S, I) giving the number of susceptible and infected people in the population, must be in the range of the volatility matrix of this process. This fundamental assumption is not satisfied in our model, because roughly speaking, there is only one Brownian motion driving the two processes, and we therefore cannot directly rely on existing result to tackle our problem. In these so-called degenerate problems, the literature has so far relied on the Pontryagin stochastic maximum principle, see for instance [56] , but this requires extremely stringent assumptions, such as linear dynamics, which are automatically precluded for SIS/SIR models. We however prove that in our specific problem, it is possible designed to help tracking down subsequent exposures after an infected individual is identified, see for instance Cho, Ippolito, and Yu [23] , or Reichert, Brack, and Scheuermann [86] . Using these would in principle erase any possibility or moral hazard, provided that all the population uses the app, and that testing is organised on a massive scale. Even admitting that this would be the case, it remains that these tools have raised complex issues of privacy, see Ienca and Vayena [57] or Park, Choi, and Ko [82] , and thus are still extremely polemical. In any case, the incentive-based approach we propose can always be considered as a useful complement to any other adopted strategy. 2 There are a certain number of papers studying disease spreading through the lens of either moral hazard or adverse selection. However, these papers are mostly interested in livestock related diseases, where producers naturally have private information on preventive measures they may have adopted, prior to contamination (ex ante moral hazard), and may or may not declare whether their herd is infected after contamination (ex post adverse selection). Such issues and the design of appropriate policies are considered for instance in Valeeva and Backus [104] , Gramig, Horan, and Wolf [45; 46] , but the problematic is completely different from the one we are interested in. A notable exception can be found in the work of Carmona and Wang [22, Section 5] , where the authors consider an application of their moral hazard theory for agents interacting through a finite state mean-field game to the containment of an epidemic. 3 to identity a whole family of contract representations (unlike the unique one in non-degenerate models), which is different from the one obtained in [29] , but which still allows us to re-interpret the problem of the principal as a standard stochastic control problem. As far as we know, ours is the first paper in the literature which uses a dynamic programming approach to solve a degenerate principal-agent problem, and this constitutes our main mathematical contribution. Unfortunately, but of course expectedly for a relatively general framework, there is no way to extract from our model explicit results, especially on the shape of optimal controls. It is therefore necessary to perform numerical simulations, by implementing semi-Lagrangian schemes, proposed for the first time by Camilli and Falcone [21] , using some truncated high-order interpolators, as proposed by Warin [105] . The numerical results for both SIS and SIR models are conclusive, and confirm the relevance of a tax and testing policy to improve the control of an epidemic. First, in the benchmark case, considered as the case where the government does not put into place a specific policy, the efforts of the population are not sufficient to contain the epidemic. In our opinion, this supports the need for incentives. Indeed, if a tax policy is put into place, even in the absence of a specific testing policy, the population is then encouraged to significantly reduce its interactions, thus containing the epidemic until the end of the period under consideration. However, for a fixed containment period, the population relaxes its effort at the very end, leading to a resumption of the epidemic at that point. Finally, if the government also adjusts its testing policy, less effort is required on the population side, so individuals can interact almost in a business-as-usual fashion, and the epidemic is largely contained by the targeted isolation of positively-tested individuals. We let N be the set of positive integers, R + := [0, ∞) and R + := (0, ∞). We fix a time horizon T > 0 corresponding to the lockdown length chosen, a priori, by the government. For every n ∈ N , S n represents the set of n × n symmetric positive matrices with real entries. We also denote by C n the space of continuous functions from [0, T ] into R n , and simplify notations when n = 1 by setting C := C 1 . The set C n will always be endowed with the topology associated to the uniform convergence on the compact [0, T ]. For every finite dimensional Euclidean space E, and any n ∈ N , we let C b (E, R) be the space of bounded, continuous functions from E to R, as well as C n b (E, R) the subset of C b (E, R) of all n-times continuously differentiable functions on E, with bounded derivatives. For every ϕ ∈ C 2 b (E, R), we denote by ∇ϕ its gradient vector, and by D 2 ϕ its Hessian matrix. In this section, in order to highlight the results we obtained throughout this paper, we present our model in an informal way. We thus detail the compartmental epidemic models we consider to represent the spreading of the virus, i.e., either a SIS or a SIR model. Indeed, at the beginning of an epidemic, it is unlikely that decision-makers, let alone the population, will have sufficient data to conclude that infected individuals become immune to the virus in question once they have recovered. This is particularly the case when the virus is new, as in the case of the COVID-19. With this in mind, we concentrate our attention to two classical models in epidemiology: the SIS model, for the case where infected individuals do not develop an immunity to the disease, and can therefore re-contract it, and the SIR model in the opposite case. Our study is therefore able to deal with both models, and one of the important points will be to compare the results obtained for each of them. We insist on the fact that this entire section is informal, and the reader is referred to Section 4 for the rigorous mathematical study. Some parameters will be common in the considered models. In particular, they both involve four non-negative parameters, λ, µ, β and γ. The parameters λ and µ represent respectively the birth and (natural) death rates among the population, and therefore reflect the demographic dynamics unrelated to the epidemic 4 , while γ represents the death rate associated to the disease. All these parameters are assumed to be constant and exogenous. In most epidemic models, the parameter β, representing the transmission rate of the disease, is also assumed to be constant and exogenous. Nevertheless, in our framework, we will consider that β is endogenous and time-dependent, in order to model the influence that the population can have on this transmission rate. More precisely, the transmission rate β depends essentially on two factors: the disease characteristics and the contact rate within the population. Although the population cannot modify the disease characteristics, each individual can choose (or be incentivised) to reduce his/her contact rate with other individuals in the population. We will thus assume that the population can control the transmission rate β of the disease, by reducing social interactions. With this in mind, we will denote by β > 0 the constant initial transmission rate of the disease, i.e., without any control measures or effort from the population. Unfortunately, reducing social interactions is costly for the population. This cost takes into account both the obvious social cost, due to accrued isolation during the lockdown period, and an economic cost (loss of employment due to the lockdown,...). From now on, β will thus denote the time-dependent transmission rate of the disease, controlled by the population. More precisely, we fix some constant β max ≥ β representing the maximum rate of interaction that can be considered, and we define B := [0, β max ]. The process β will be assumed to be B-valued, and we will denote by B the corresponding set of processes. 5 One of the two epidemic models we will study is inspired by the well-known SIS (Susceptible-Infected-Susceptible) compartment model, which mainly considers two classes S and I within the population: the class S represents the 'Susceptible', while the class I represents the 'Infected'. In this model, during the epidemic, each individual can be either susceptible or infected, and (S t , I t ) denotes the proportion of each category at time t ≥ 0. More precisely, as in classical SIS models, we assume that an infected individual returns, after recovery, to the class of susceptible individuals, and can therefore re-contract the disease. We denote by ν the associated rate, which is assumed to be a non-negative constant. We also take into account the demographic dynamics of the population, i.e., births and deaths (related to the considered disease or not), through the previously mentioned parameters λ, µ and γ. To sum up, the model is represented in Figure 1 below, and the (continuous-time) evolution of the disease is described by the following system   for an initial compartmental distribution of individuals at time 0, denoted by (s 0 , i 0 ) ∈ R 2 + , supposed to be known. Death The second epidemic model we will focus on is the classical SIR (Susceptible-Infected-Recovered) compartment model. As in the SIS model, the class S represents the 'Susceptible' and the class I represents the 'Infected'. The SIR model is used to describe epidemics in which infected individuals develop immunity to the virus. This therefore involves a third class, namely R, representing the 'Recovered', i.e., individuals who have contracted the disease, are now cured, and therefore immune to the virus under consideration. We denote by ρ the recovery rate, which is assumed to be a fixed non-negative constant. Therefore, during the epidemic, each individual can be either susceptible, infected or recovery, and (S t , I t , R t ) denotes the proportion of each category at time t ≥ 0. As in the previously described SIS model, we also take into account the demographic dynamics of the population, through the parameters λ, µ and γ. To sum up, the epidemic scheme is represented in Figure 2 , and the (continuous-time) evolution of the disease is described by the following system for a given initial distribution of individuals at time 0, denoted by (s 0 , i 0 , r 0 ) ∈ R 3 + and assumed to be known. Recovery The use of a deterministic model is widespread and generally justified for most epidemics. However, in our case study, and given what is currently happening in many countries, it appears that the number of infected individuals is not so simple to quantify and estimate. Indeed, without a large testing campaign, it seems complicated to know precisely the proportion of infected in the population. This is particularly true in the case of the COVID-19 epidemic: the absence of symptoms for a significant proportion of infected individuals leads to uncertainty about the actual number of susceptible and infected. As a consequence, it seems more realistic in our study to turn both the SIS and SIR deterministic controlled models previously described, into stochastic controlled models. Concerning the deterministic part, the dynamics written in the previous systems remain identical. The volatility is partly represented by a fixed and deterministic parameter σ > 0, and by a time-dependent process α, representing the actions of the government in terms of testing policy. More precisely, in our model, an increase of the number of tests in the population, represented by a decrease of the parameter α, leads to a decrease in the volatility of the processes S and I. Hence, both the population and the government have a clearer view of the number of susceptible and infected, and thus on the epidemic. However, this strategy comes at an economic cost for the government. We then assume that, without any specific effort of the government, α is equal to 1. We also fix a small parameter ε ∈ (0, 1) to consider the subset A := [ε, 1]. 6 The control α of the government is assumed to be A-valued, and we denote by A the corresponding set of processes. 7 In addition, the testing policy allows the government to isolate individuals with positive test results. Therefore, the control α also has an impact on the effective transmission rate of the disease. More precisely, without any testing policy, i.e. α = 1, the government cannot isolate contaminated individuals efficiently. In this case, all infected people spread the disease, and the transmission rate of the virus is given by β. Conversely, if a testing policy is put into place by the government, i.e. when α < 1, we consider that individuals with positive test results can be isolated, and as a consequence less infected people spread the disease. In this case, the effective transmission rate is lower. We however do not assume that the impact of the testing policy on the volatility of S and I, and on the transmission rate has the same magnitude. Indeed, we expect a lower reduction of the effective transmission rate, compared to the volatility reduction for a given policy α. This should be understood as a manifestation of the fact that it is easier to reduce the uncertainty on the number of infected people, compared to actually isolate individuals who have been identified as infected. We thus assume a linear dependency with respect to α for the volatility of both S and I, while the effective transmission rate is chosen equal to β √ α, so that the number of infected people spreading the disease at time t is given by We can now consider the SIS model previously defined by (2.1), but in its stochastic version: the number of infected, and therefore the number of susceptible, are impacted at each time t by a Brownian motion W t . More precisely, the dynamic of the epidemic is now given by the following system Similarly to the SIS model, we consider that the deterministic model SIR described by (2.2) is also subject to a noise in the estimation of the proportion of susceptible and infected individuals. Inspired by the stochastic SIR model in Tornatore, Buccellato, and Vetro [103] , the dynamic of the epidemic is now given by the following system (2.4) Note that the proportion R of individuals in recovery is also uncertain, but only through its dependency with respect to I. More precisely, we assume that there is no uncertainty on the recovery rate ρ, implying that if the proportion of infected individual is perfectly known, the proportion of recovered is also known without uncertainty. This modelling choice is consistent with most stochastic SIR models, and emphasises that the major uncertainty in the current epidemic is related to the non-negligible proportion of (nearly) asymptomatic individuals. Indeed, an asymptomatic individual may be mis-classified as susceptible. This is also the case for an individual in recovery, who has been asymptomatic, but the uncertainty is solely related to the fact that he was not classified as infected when he actually was. In order to provide a unified framework for both the SIS and SIR models, and simplify the presentation, we will consider the following dynamic for the epidemic (2.5) Notice that to recover the SIS model, one has to set ρ = 0, and conversely, ν = 0 to obtain the SIR model. In addition to the choice of a testing policy, the government can also incentivise the population to limit their social interactions, in order to decrease the transmission rate of the disease, by introducing financial penalties. More precisely, at time 0, the government informs the population about its testing policy α ∈ A, as well as its fine policy χ ∈ C 8 , for the lockdown period [0, T ]. Knowing this, the population will choose an interacting behaviour according to the following rules: (i) an increase in the tax lowers its utility; (ii) an increase in the level of interaction (up to a specific threshold, namely β) improves its well-being; (iii) the population is scared of having a large number of infected people. 8 See Section 4.1.3 for a rigorous definition of the set C of admissible fine policies. We stylise the previous facts by considering that the population solves the following optimal control problem, for a given pair (α, where u : [0, T ] × B × R + −→ R and U : R −→ R are continuous functions in all their arguments, and U is a bijection from R to R. Given a pair (α, χ), the set of optimal contact rates β will be denoted B (α, χ). 9 The functions u and U should be interpreted as functions translating respectively the actual value of interaction from the point of view of the population, and the disutility associated to the fine. More precisely, the function U is assumed to be an increasing function, according to (i) above. Concerning the function u, it should be non-decreasing in the second variable up to β, and then non-increasing, modelling (ii) above. On the other hand, the function u is assumed to be non-increasing with respect to the proportion of infected individual in the population. In particular, this allows to take into account both the fear of the infection (as mentioned in (iii) above) and the cost that is incurred if an individual is infected. From the population's point of view, this cost is not actually expressed in terms of money, but mainly corresponds to medical side effects or general morbidity. We refer to Anand and Hanson [7] , Zeckhauser and Shepard [110] and Sassi [94] , for an introduction to QALY/DALY (Quality-and Disability-Adjusted Life-Year), the generic measures of disease burden used in economic evaluation to assess the value of medical interventions. We choose to normalise the utility of the population to zero when there is no epidemic. In other words, if i 0 = 0, then I t = 0 for all t ∈ [0, T ], and thus the utility of the population should be equal to 0. With this in mind, we assume that U (0) = 0, which means that without a fine, the population does not suffer any disutility. Moreover, when there is no epidemic, the population should not reduce its social interaction, meaning that for all t ∈ [0, T ], β t = β. This leads us to assume that represents the social cost of lockdown policy, and thus should capture the two rules (ii) and (iii), as well as satisfy u(t, β, 0) = 0 for all t ∈ [0, T ]. In particular, we could consider a separable utility function u of the form where the function u I : R + −→ R represents the fear of the infection for the population. In order to choose this function, we would like model the fact that when the proportion of infected is close to 0, the population underestimates the epidemic, while when this proportion becomes large, the population becomes irrationally afraid. Therefore, we can consider a function independent of t, and take Next, the function u β represents the sensitivity of the population with respect to the initial transmission rate β of the disease, i.e., without any lockdown measure. During the lockdown period, the social cost of distancing measures becomes more and more important for the population, and we thus expect the cost u β to also reflect this sensitivity with respect to time. More precisely, we can consider two particular functions to model these stylised facts for some η p > 0, to insist on the fact that it is costly for the population to deviate from its usual contact rate, i.e. its level of interactions in an epidemic-free environment, inducing the natural transmission rate of the disease β; Finally, concerning the utility of the Agent with respect to the tax χ, we choose a mixed CARA-risk-neutral utility function where θ p > 0 is the risk-aversion of the population, and φ p > 0, so that U (0) = 0, and U is an increasing and strictly concave bijection from R to R. For later use, we record that the inverse of U , denoted by U (−1) , has an explicit form (see Corless, Gonnet, Hare, Jeffrey, and Knuth [25] for more details about the LambertW function) Before turning to the principal-agent problem itself, we aim at solving (4.7) for α = 1 fixed, and χ = 0, i.e. without tax and testing policy. Similar problems have been studied in for instance Kandhway and Kuri [61] . Mathematically speaking, the optimisation problem faced by the population without contract is informally given by since we assumed U (0) = 0. Notice that by assumption on the function u, in the no-epidemic case, i.e., if i 0 = 0, the population should not make any effort, and therefore the optimal contact rate β over the period [0, T ] is equal to β. We thus consider in the following a fixed initial condition (s 0 , i 0 ) ∈ (R + ) 2 , which implies that for all t ∈ [0, T ], both S t and I t are (strictly) positive. Without tax, the population's problem boils down to a standard control problem, with two state variables S and I. We will give the associated PDE in Section 2.4.1 below. One of the main theoretical result of our study is given by Theorem 4.7. Informally, this theorem states that given an admissible contract, namely a testing policy α ∈ A and a tax χ ∈ C, there exist a unique Y 0 and Z such that the following representation holds where β is the unique optimal contact rate for the population. More precisely, we can state that for (Lebesgue-almost Under some assumptions for existence and smoothness of the inverse of the function U , the previous equation gives a representation for the tax χ. Based on (2.10), the tax χ will be indexed on the variation of the proportion of infected I, through the stochastic integral · 0 Z s dI s , and not on the variation of susceptible S (though it is indexed on S through the dt integral). Nevertheless, using the link between the dynamics of I and S, we can write a representation equivalent to (2.10) Through this equation, we can state that the tax can be indexed on S instead of I. Therefore, given the strong link between the number of Susceptible and the number of Infected, it is sufficient to index the tax on only one of these two quantities, and one can therefore choose indifferently to index the tax χ on the variations of I or S. The reader familiar with contract theory in continuous-time will have noticed that the previous representation for the tax χ is not exactly the expected one. Indeed, referring for instance to Cvitanić, Possamaï, and Touzi [29] the contract is usually the sum of three components: (i) a constant similar to Y 0 , chosen by the Principal in order to satisfy the participation constraint of the Agent; (ii) an integral with respect to time t ∈ [0, T ] of the Agent's Hamiltonian; (iii) a stochastic integral with respect to the controlled process, i.e., in our framework, (S, I). Neither the representation (2.10) nor (2.11) are, a priori of this form. This difference is due to the fact that the dynamics of (S, I) is degenerated. More precisely, there is a fundamental structure condition in [29] requiring that the drift of the output process belongs to the range of its volatility. In words, defining for (s, which is obviously impossible here. Therefore, we cannot use directly any existing result in the literature, and we should not expect, a priori, to be able to obtain a contract representation similar to the one in [29] , nor that the so-called dynamic programming approach will prove effective in our case. Indeed, as far as we know, such degenerate models have only been tackled using the stochastic maximum principle, see Hu, Ren, and Touzi [56] . However, and somewhat surprisingly, the form we exhibit for the tax is actually strongly related to the usual representation. The reason for this is twofold. First, up to the sign, the volatilities in the dynamics of both S and I are exactly the same. Second, both the processes S and I are driven by the same Brownian motion W . Therefore, intuitively, in order to provide incentives to the population, the government can afford to index the tax on only one of the two processes. Mathematically, it is also straightforward to show that given an arbitrary decomposition of the process Z in Equation (2.10) of the form Z =: where H is the Hamiltonian of the population, and this is exactly the general form provided in [29] . The main difference is that in [29] , Z s and Z i are both uniquely given, while in our representation, only their difference actually matters. Hence, there is an infinite number of possible representations for the tax χ in our degenerate model. As already explained, the government can choose the tax χ ∈ C paid by the population together with the testing policy α ∈ A. It aims at minimising the number of infected people until the end of the quarantine period, and we informally write its minimisation problem as where c : The function c denotes the instantaneous cost implied by the proportion of infected people during the quarantine period, and is thus assumed to be non-decreasing, while the function k represents the cost of the testing policy. In addition, the set Ξ takes into account the so-called participation constraint for the population. This means that the government is benevolent, which translates into the fact that it has committed to ensure that the living conditions of the population do not fall below a minimal level. Mathematically, the government can only implement policies Concerning the cost function k associated with the testing policy, we recall that α = 1 means no testing policy, so no cost for the government. As soon as α is different from 1, the cost has to be higher. We may consider the following function for the testing policy k, for some η g > 0 and κ g > 0, This function highlights the fact that it is very costly, if not impossible, to eliminate the uncertainty associated with the epidemic. Indeed, in a relatively populous country, it seems impossible to develop a testing policy sufficient to know exactly the proportion of susceptible and infected people. Another interesting case to compare our results with, corresponds to the so-called first-best case. This is the bestpossible scenario where the government can enforce whichever interaction rate β ∈ B it desires, and simply has to satisfy the participation constraint of the population. From the practical point of view, this could correspond to a situation where the government would be able to track every individual and force them to stop interacting. The problem faced by the government is then In this section, we present the main theoretical results obtained when the dynamic of the epidemic is given by (2.5) . Recall that, in order to consider the SIS or the SIR model, one has to set respectively ρ = 0 or ν = 0. As mentioned in Section 2.3.2, the benchmark problem is a standard Markovian stochastic control problem, whose We then have the natural identification where v solves the associated Hamilton-Jacobi-Bellman (HJB for short) equation for a particular function F defined by (4.4) in Section 4.1. Note that if we consider separable utilities with the form In particular, the optimal interaction rate is given by To find the optimal interaction rate β ∈ B, as well as the optimal contract (α, χ) ∈ A × C, in the first-best case, one has to solve the government's problem defined by (2.14). Mathematical details are postponed to Section 4.3.3, but we present here an overview of the main results. To take into account the inequality constraint in the definition of V P,FB 0 , one has to introduce the associated Lagrangian. Given a Lagrange multiplier > 0, we first remark that the optimal tax is constant and given by Then, defining for any > 0 we have Note that V 0 ( ) is the value function of a standard stochastic control problem, and therefore we expect to have In particular, if we consider separable utilities with the forms (2.17), for a given testing policy α ∈ A and a Lagrange multiplier > 0, the optimal interaction rate is given for all t recalling that b • is defined by (2.18). Thanks to the reasoning developed in Section 4, we are able to determine the optimal design of the fine policy, the optimal testing policy, as well as the optimal effort of the population. First, as informally explained in Section 2.3.3, to implement a tax policy χ ∈ C, the government only needs to choose a constant Y 0 and a process Z. Given these two parameters, we can state that the optimal contact rate for the population is defined by It thus remains to solve the government's problem in order to determine the optimal choice of Y 0 and Z. The reader is referred to Section 4.3 for the rigorous government's problem, but, to summarise the results, the optimal process Z as well as the optimal testing policy α are determined so as to maximise the government's Hamiltonian, given by Finally, it remains to solve numerically the following HJB equation, for all t ∈ [0, T ] and x : where the natural domain over which the above PDE must be solved is The results presented in Section 2.4 are quite theoretical: except for the optimal transmission rate, it is complicated to obtain explicit formulae for the other variables sought, in particular for the optimal testing policy α, even if we consider separable utility functions as in (2.17) . It is therefore necessary to perform numerical simulations to evaluate the optimal efforts of the population and the government, as well as the optimal tax policy. Given the similarities in the results between the SIS and SIR models, only those related to the SIR model are presented in this section. The reader will find in Appendix A the results corresponding to the SIS model. The following numerical experiments are implemented using the utility and cost functions respectively mentioned in Example 2.1 for the population and in Example 2.2 for the government. To summarise, we choose for the population: Table 1 . In addition, the set of parameters used for the simulations of the epidemic dynamics given by (2.5) are provided in Table 2 and are inspired by those chosen by Élie, Hubert, and Turinici [38] . Recall that the parameter β denotes the usual contact rate within the population, before the beginning of the lockdown. In other words, β represents the initial and effective transmission rate of the disease, without any specific effort of the population. The associated reproduction number R 0 , commonly defined by R 0 := β/(ν + ρ) in the literature on epidemic models, is equal to 2.0, and is thus in the confidence interval of available data, see for example Li et al. [70] . Recall that the parameters λ and µ represent respectively the birth and (natural) death rates among the population, and therefore reflect the demographic dynamics unrelated to the epidemic, while γ represents the death rate associated to the disease. To simplify, and since the duration of the COVID-19 epidemic should be relatively short in comparison to the life expectancy at birth, we choose to disregard the demographic dynamics by setting λ = µ = 0. In contrast, we set γ = 1%, since the mortality associated with the disease appears to be significant. Finally, recall that the parameters ν and ρ correspond respectively to the recovery rates in the SIS and SIR models, i.e., the inverse of the virus contagious period. Since we want to consider here a SIR dynamic, we let ν = 0 and ρ = 0.1, to account for the average 10-day duration of COVID-19 disease. When not explicitly specified, the simulations presented in this section are performed with the sets of parameters described in Tables 1 and 2 . However, the parameters used to describe in particular the utility and cost functions of the population and government are set in a relatively arbitrary way. To actually estimate these parameters would require an extensive sociological and economic study, that we do not presume to be able to perform at this stage, and linking, for example, the population's costs to the DALY/QALY concepts already mentioned, and the government's costs to those of the health care system and its possible congestion. Moreover, there is considerable uncertainty in the medical literature on the choice of all parameters used to describe the dynamics of the epidemic, in particular because the COVID-19 is a new type of virus and therefore we do not have sufficient hindsight to reliably estimate its characteristics. It will therefore be necessary to study the sensitivity of the results obtained with respect to the selected parameters. Finally, it should be remembered that, in contrast to usual principal-agent problems, the government implements a mandatory tax, which the population cannot refuse. Nevertheless, we consider that the government is benevolent, in the sense that it still wishes to ensure that the utility of the population remains above a certain level, denoted by v. To fix this level, we assume that the government wants to ensure at the very least to the population the same living conditions it would have had in the event of an uncontrolled epidemic, i.e., without any effort on the part of neither the population nor the government, meaning β = β, α = 1 and χ = 0. Mathematically, this is equivalent to the following, since u is separable of the form (2.7), such that for all t ∈ [0, T ], u β (t, β) = 0 and u I satisfies (2.8) Notice that the reservation utility v is given by the worst case scenario, without any sanitary precaution neither from the population nor from the government. This level may be judged too severe, and one could consider a model where the government is more benevolent. In particular, one could set v closer to the value that the population achieves in the benchmark case, i.e., when it makes optimal efforts in the absence of government policy. Nevertheless, the value of v should not be of major importance, since it should only impact the initial value Y 0 . In order to solve Equation (2.16) corresponding to the population's problem in the benchmark case, as well as Equation (2.22) for the government's problem, we need a method permitting to deal with degenerate HJB equations. We choose to implement semi-Lagrangian schemes, first proposed in Camilli and Falcone [21] . These are explicit schemes using a given time-step ∆t, and requiring interpolation on the grid of points where the equation is solved. This interpolation can be either linear, as proposed in [21] , or using some truncated higher-order interpolators, as proposed by Warin [105] , leading to convergence of the numerical solution to the viscosity solution of the problem. A key point here, which makes the approach delicate, is that the domain over which the PDEs are solved is unbounded. It is therefore necessary to define a so-called resolution domain, over which the numerical solution will be actually computed, which on the one hand must be large enough, and which on the other hand creates additional difficulties in the treatment of newly introduced boundary conditions. In order to treat these issues, we use two special tricks: (i) picking randomly the control in (2.5) for the benchmark case, and in (4.16) for the general case, and using the forward SDE with an Euler scheme, a Monte-Carlo method allows us to get an envelop of the reachable domain with a high probability at each time-step. Then, given a discretisation step fixed once and for all, the grid of points used by the semi-Lagrangian scheme is defined at each time-step with bounds set by the reachable domain estimated by Monte-Carlo. Therefore, at time step 0, the grid is only represented by a single mesh, while the number of meshes can reach millions near T ; (ii) since the scheme is explicit, starting at a given point at date t, it requires to use only some discretisation points at date t + ∆t, and a modification of the general scheme is implemented to use only points inside the grid at date t + ∆t, as shown in [105] . Lastly, in dimension 3 or above, parallelisation techniques defined in [105] have to be used in order to accelerate the resolution of the problems. The numerical results below are obtained using the StOpt library, see Gevret, Langrené, Lelong, Warin, and Maheshwari [43] . We first focus on the benchmark case, when the government does not implement any particular policy to tackle the epidemic, i.e., α = 1 and χ = 0. Recall that in this case, the population's problem is given by (2.9), and is then equivalent to solving the HJB equation (2.16). For our simulations, we choose a number of time-steps equal to 200, and a discretisation step equal to 0.0025. The interpolator is chosen linear, and the optimal command b • used to maximise the Hamiltonian is discretised with 200 points given a step discretisation of 0.005. Once the PDE is solved, a forward Euler scheme is used to obtain trajectories of the optimally controlled S and I, meaning with the optimal transmission rate b • . In order to check the accuracy of the method described in Section 3.2, we implement two versions of the resolution (i) the first version is a direct resolution of (2.16) with the Hamiltonian (2.15); (ii) the second one relies on a change of variable. More precisely, we consider (s, x := (s + i)) as state variables, instead of (s, i), and then solve the problem (2.16), but with a slightly modified Hamiltonian to take into account this change of variable The advantage of the second representation is that the dispersion of I t + S t is zero and thus smaller than the one of I t , leading to the use of grids with a smaller number of points. First, to give an overview of the overall trend, we plot, on Figure 3 , 100 trajectories of the optimal interaction rate β , and the associated proportions S t and I t of susceptible and infected, using the resolution method (i) mentioned above, i.e., with state variables (S, I). For more accurate trajectories, we compare on Figure 4 two different trajectories of the optimal interaction rate β , together with the corresponding dynamic of the proportion I of infected. For these two simulations, we compare the results given by the two aforementioned methods. More precisely, while the blue curve is obtained through the direct resolution, the orange one results from the second method, i.e., with state variables (S, S − I). Finally, on Figures 5 and 6 , we test the influence of the parameter τ p by setting τ p = 0.01, instead of 0. Proportion I of infected Proportion S of susceptible Voluntary lockdown of the population. As expected, the optimal behaviour β is to start close to β, then we note that β decreases as the disease spreads in the population. More specifically, two waves of effort can be observed: the first one delays the acceleration of the epidemic, and the second, generally more significant, takes place during the peak of the epidemic. Approaching the fixed maturity, individuals come back to their usual behaviour β. However, even if the population chooses to decrease the interaction rate among individuals, the range of β stays quite small with minimum 0.16 and maximum β = 0.2. Optimal effort % of infected Sensitivity with respect to the method. As we can notice in Figure 4 (top), the optimal effort obtained for these two simulations exhibits the same features as those previously described. Moreover, the blue curve and the orange curve, representing respectively the results of the two aforementioned methods, are very close, except at the beginning of the time interval, probably because of the very small initial value i 0 . Nevertheless, we can see on the bottom graphs that the two methods lead to the same dynamic for the proportion of infected, since the two curves, blue and orange, are almost superposed. Therefore, a small error on the computation of the optimal effort at the beginning does not impact the optimally controlled trajectories of I. The resolution with respect to (s, s + i) seems to be more regular, and may give a command closer to the analytical one. The fear of the infection is not enough. Without a proper government policy to encourage the lockdown, the natural reduction of the interaction rate among individuals is not sufficient to contain the disease, so that it spreads with a high infection peak, up to 0.175. As a result, even if at the end of the time interval under consideration, the epidemic appears to be over, between 60 and 80% of the population has been contaminated by the virus, since the proportion S at time T = 200 lies between 0.2 and 0.4. In conclusion, without some governmental measures, the fear of the epidemic is not sufficient to encourage the population to make sufficient effort, in order to significantly reduce the rate of transmission of the disease. The introduction by the government of an effective lockdown policy together with an active testing policy should improve the results of the benchmark case, in particular by reducing the peak of infection and the total number of infected people over the considered period. Optimal effort % of infected Simulation 1 Simulation 2 Figure 5 : The optimal transmission rate β and the resulting proportion I with τ p = 0.01 Comparison between of the two methods on two simulations. The lockdown fatigue. By setting τ p = 0.01 instead of 0, the cost of the lockdown from the population's point of view is now increasing with time. This allows to take into account the possible fatigue the population may suffer if the lockdown continues for too long. As expected, by comparing Figures 3 and 6 , the impatience of the population, gives higher values of optimal interaction rate β. Moreover, comparing Figures 4 and 5 , we can see that in both simulations, the second wave of effort is of course more impacted (i.e., the contact rate is less reduced) by the impatience of the population than the first one. Optimal control β Proportion I of infected % S of susceptible We focus in this section on the tax policy, by assuming that A = {1}. In words, we assume that the government does not implement a specific testing policy, which means α = 1 as in the benchmark case, but only encourages the population to lockdown through the tax policy χ. In such a situation, i.e., without a proper testing policy, the detection and hence the isolation of ill people becomes very intricate. The only possibility to regain control of the epidemic was to reduce the interaction rate of the population. This case is interesting, as it corresponds to the lockdown policy that most of western countries have implemented in 2020, when faced with the COVID-19 disease, while a very small number of tests was available. Indeed, most countries put in place systems of fines, or even prison sentences, to incentivise people to lockdown. Although the penalties for non-compliance are not as sophisticated as in our model, most governments did adapt the level of penalties according to the stage of the epidemic: higher fines during periods of strict lockdown (hence at the peak of the epidemic), or in case of recidivism, for example. This reflects the adjustment of sanctions in many countries according to the health situation, and therefore a notion of dynamic adaptation to circumstances, which is exactly what is suggested by our tax system. Though it is clear that our model is different from reality, since we consider a fine/compensation, paid at some terminal time T , and equal for each individual, whereas in most countries, the fine is paid by a particular individual who has not complied with the injunctions, we still believe it allows to highlight sensible guidelines. The numerical approach is highly similar to the method used to solve the benchmark case. One difference is that we have to estimate the reservation utility of the population, namely v, given by (3.2). Using a Monte-Carlo method and an Euler scheme with a time-discretisation of 200 time-steps and 10 6 trajectories, we obtain an approximated value v = −0.02937. Then, we can solve (2.22) through the aforementioned semi-Lagrangian scheme, with 200 time steps, as well as a step discretisation for the grid in (s, i, y) corresponding to (0.0025, 0.0025, 0.005), leading to a number of meshes at maturity equal to 250 × 70 × 800 (for Z max = 30). A last technical point concerning the domain of the control Z. Although this control of the government, used to index the tax on the proportion of infected, can take high values, we have to bound its domain in order to perform the numerical simulations. We choose to restrict its domain to an interval [−Z max , Z max ], and consider a discretisation step equal to 0.5. One would naturally expect that a larger choice would lead to somewhat better solutions. However, this neglects a fundamental numerical issue: large values of Z increase the numerical cost, as they enlarge the volatility of the process Y (given by σZIS). As such, since the volatility cone becomes larger, it is necessary to sample a much larger grid in order to be able to cover the region were Y will most likely take its values. Too large values of Z max therefore become numerically intractable, unless one is willing to sacrifice accuracy. A balance need to be struck, which is why we capped Z maz at 30. A sensitivity analysis with respect to variations of Z max is provided in Figure 8 . Though the trajectories of the optimal Z are somewhat impacted, Figure 7 confirms that this is minimal impact on the trajectories of I itself. Indeed, for different values for Z max , the shape of the parameter Z remains the same. More importantly, we will see that the paths of the optimal transmission rate, namely β , associated to different Z max , are almost superposed. As a consequence, the dynamic of I also follows almost the same paths independently of Z max . First, we present in Figure 7 different trajectories of the proportion I of infected when the government implements the optimal tax policy, and compare it to the trajectories obtained in the benchmark case. As mentioned before, we also want to study the sensibility with respect to the arbitrary bound Z max , and we thus represent the paths of I in three cases, in addition to the benchmark case: for Z max = 10 (orange curves), Z max = 20 (green), and Z max = 30 (red). Then, the corresponding simulations of the optimal control Z of the government, used to index the tax on the proportion of infected, is given in Figure 8 . We compare optimal controls β and Z for the tax policy with different lockdown time period in Figure 9 . Finally, Figure 10 regroups the simulations of the optimal transmission rate β obtained with the tax policy, and compare it to β • obtained in the benchmark case. The epidemic is at best contained, and at worst delayed. Compared to the benchmark case, we observe in Figure 7 that the optimal lockdown policy prevents the epidemic peak in most cases by maintaining the epidemic to low levels of infection during the lockdown period. Therefore, the government has more time to prepare for a possible infection peak after the lockdown, specifically to increase hospital capacity and provide safety equipment (surgical masks, hydro-alcoholic gel, respirators...). The government can also use this time to fund the development of tests to detect the virus, as well as the research on a vaccine or a remedy for the related disease. Nevertheless, we can see that at the end of the lockdown period, in many cases the virus is not exterminated and the epidemic may even restart. This is particularly well illustrated by Figure 11 , representing 500 trajectories of I, obtained with the optimal control. Such a phenomenon can be understood as follows: the lockdown slows down the epidemic, so that a very small proportion of the population has been infected and is therefore immune. We thus cannot thus rely on herd immunity, which is reached if at least 50% of the population has been contaminated, to prevent a resurgence of the epidemic. Consequently, this lockdown policy is a powerful leverage to control an epidemic, but this tool needs to be supplemented by alternative policies, such as those mentioned above, in order to be fully effective. If the time saved through lockdown is not exploited, it will have no impact on the final consequences of the epidemic, measured by the economic and social cost associated with the total number of people infected and deceased during the total duration of the epidemic. Policy implications By comparing the graphs in Figure 8 , we first remark that the shape of the optimal indexation parameter rate Z remains the same, regardless of the simulation and the value of Z max . The control takes the most negative value possible (−Z max ) for about 20 days, then increases almost instantaneously to reach the maximum value Z max , before slowly decreasing to 0. Therefore, the optimal tax scheme set by the government is as follows. First, at the beginning of the epidemic, it seems optimal to give to the population a compensation (corresponding to a negative tax) as maximal as possible, by setting Z = −Z max . Though this may be a numerical artefact, given that the initial values of I and its variations are extremely low, the fact that the same phenomenon appeared in virtually all our simulations tends to show that it is actually significant. We interpret this as a the government anticipating the negative consequences of the lockdown policy by immediately providing monetary relief to the population. This is exactly what happened in several countries, for instance in the USA with stimulus checks sent to every citizen, and our model endogenously reproduces this aspect. Policy-wise, it also shows that maximum efficiency for such stimulus packages is attained when they are provided to the population as early as possible. After this initial phase, when the epidemic spreads among the population, the government suddenly increases Z, so that the tax becomes positive and is in fact maximum, in order to deter people from interacting. Approaching the maturity, the government eases the lockdown little by little. However, this end of lockdown may be premature, since we have observed in the previous figures that the epidemic may restart at the end of the considered period. Indeed, considering a final time horizon is equivalent to assuming that 'the world' stops at that time: all the potential costs generated by the epidemic after T are not taken into account in the model. The government thus has no interest in implementing costly measures, whose subsequent impact on the epidemic will not be measured. Nevertheless, this boundary effect has no impact on the previous results and interpretations. Indeed, we remark in the numerical results that if we consider a more distant time T , the lockdown certainly lasts longer, but follows the exact same paths during most of the lockdown period, and its release occurs around the same time before maturity (see Figure 9 below). Moreover, the lockdown period should still end at some time, which is why a finite terminal time is assumed. This time may correspond to an estimate of the time needed to implement other more sustainable policies than lockdown, such as the implementation of an active testing policy, or to hope for the discovery of a vaccine or cure, as mentioned above. Optimal control Z Optimal tax sensitivity with respect to the lockdown duration. On Figure 9 , we give two trajectories of the optimal contact rate β (on the left) and the optimal indexation parameter Z (on the right) for two different maturities. It is clear that both trajectories follow the same paths until some point. Regardless of the maturity, the contact rate β and the parameter Z have the same characteristics as those shown respectively in Figures 8 and 10 . As one approaches the shortest maturity, i.e. T = 200, the parameter Z decreases towards 0 for the contract of this maturity, while the other remains at the maximum, and decreases later, as its maturity approaches. Therefore, the fact that Z decreases at maturity, as mentioned in the paragraph 'Policy implications' above appears to be a boundary effect since it is not sensitive with respect to the maturity. Optimal interaction rate and comparison with the benchmark case. We now explain the general trend of the optimal interaction rate. In the beginning, recall that Z is negative, meaning that the tax is negatively indexed on the variation of I. In other words, since I is globally (but very slightly) increasing at the beginning of the epidemic, the compensation increases with I, which means that the population is not incentivised at all to decrease their contact rate, and thus the transmission rate of the virus, which remains equal to the initial level β. Then, as the epidemic spreads, Z becomes very high, which now incentivises the population to reduce the transmission rate below β. Finally, near the end of the lockdown period, Z plunges to zero, which naturally implies that the optimal contact rate β goes back to its usual level β. By comparing with the benchmark case, we see that the tax policy succeeds in reducing significantly the interaction rate. As a consequence, and as we have seen in Figure 7 , the tax policy contains the spread of the disease during the considered time period, unlike in the case without intervention of the government. Contract case Benchmark case Figure 11 : 500 simulations of the proportion I of infected in the SIR model Comparison between the case with tax policy (but without testing) on the left and the benchmark case on the right. In this section, we now study the case where the government can implement an active testing policy, in addition to the incentive policy for lockdown, to contain the spread of the epidemic. This policy is similar to the one adopted by most European governments in June 2020, after relatively strict containment periods and at a time when the COVID-19 epidemic seemed to be under control. Indeed, the lockdown periods in Europe have generally made it possible to delay the epidemic, and thus to give public authorities time to prepare a meaningful testing policy by developing and increasing the number of available tests. This testing policy has two major interests. First, it allows the identification of clusters, and therefore provides a more precise knowledge of the dynamics of the epidemic in real time on the different territories. Second, by identifying infected people, we can force them to remain isolated, in order to avoid the contamination of their relatives. This policy therefore constitutes another leverage, in addition to containment, to reduce the contact rate within the population. Thus, by developing a robust testing policy, public authorities can in fact relax the lockdown while keeping the rate of disease transmission at a sufficiently low level. Therefore, comparing with the no-testing policy case, we expect that (i) the government will be able to control the epidemic at least as well as with just the lockdown policy; (ii) it will allow the population to regain a contact rate closer to the desired and initial level β. To study the optimal testing policy α , taking values in A := [ε, 1], we consider the cost of effort k given by (3.1b) . This cost function emphasises the fact that testing the entire population every day is inconceivable, and therefore results in an explosion of cost when α takes values close to 0. Recall that the parameters for the function k, namely κ g and η g are given in Table 1b . Finally, A is discretised with a step equal to 0.05 and we consider Z max = 30. As we can see from the six selected simulations below, the control Z is very regular (see Figure 12 ), while the control α is less regular and concentrated at the heart of the epidemic (see Figure 13 ). Figure 16 gives a global overview of the 500 simulations, which confirms the intuition given by the six selected ones. Comparison between the three cases, the benchmark, with, and without testing. Relaxed lockdown but lower effective transmission rate. First, comparing Figures 8 and 12 , the optimal control Z presents the same shape in both cases, except at the beginning, since now Z is not negative initially. In fact, in this case, we observe that the government is asking for less effort from the population, and therefore the initial stimulus mentioned in the paragraph 'Policy implications' still happens, but later and for a much shorter length. Figure 15 also shows that the optimal contact rate is closer to the initial level β, which should induce a more violent spread of the disease. Nevertheless, the control α, representing the testing policy and given by Figure 13 , balances this effect. Indeed, the testing allows an isolation of targeted infected individual, and therefore contribute to the decrease of the effective transmission rate of the disease, represented in Figure 14 . Therefore, comparing Figure 16 with Figure 11 , we notice that the control of the epidemic is more efficient than in the case A = {1}, since the proportion of infected is globally decreased. Optimal contact rate β Effective transmission rate β √ α First, remark that, with the particular choice of utility functions, we have Otherwise, if ≥ 2, the optimal tax policy is equal to −∞, which cannot be optimal from the government's point of view, since it leads to an infimum on equal to +∞ (see (2.21) ). For each value of the Lagrange parameter, a two dimensional PDE with a two-dimensional control (α, β) is considered. A step discretisation for the grid in (s, i) is taken equal to (0.001, 0.001). A = [ε, 1] is discretised with 20 values and the values of β are discretised with 80 equally spaced values (to reduce the cost of optimisation). We then search for the optimal parameter with a step of 0.01 within the interval (0, 2). We obtain in this case an optimal value equal to 0.64 and we give on Figure 17 the results, which show in particular that the epidemic is controlled in a similar way as in the second-best case, with incentives and testing policy. Testing policy α Proportion of infected I Figure 17 : 500 trajectories obtained in the first-best case. The shape of the optimal controls β and α, as well as the trajectories for the proportion I of infected, are highly similar to those obtained in the previous case. The only clear difference is the principal's value. Indeed, we can compare the optimal value V P 0 for the government in the moral hazard case, to the first best value V P,FB 0 . Using 10 4 trajectories and the previously optimal control computed, we estimate V P,FB 0 = −0.249 while V P 0 = −0.287. The difference between the two values, with a relative difference of 15% only pleads in favour of our incentive model: even without being able to track all the population, governments can achieve containment strategies with very similar levels of efficiency, and costs which are not significantly higher. This is of course partly explained by the fact that the testing is profitable both for the government and for the population, as it allows for values of β very close to its usual value β, as shown on Figure 17 . We fix a small parameter ε ∈ (0, 1) to consider the subset A := [ε, 1]. We then define by A the set of all finite and positive Borel measures on [0, T ] × A, whose projection on [0, T ] is the Lebesgue measure. In other words, every q ∈ A can be disintegrated as q(ds, dv) = q s (dv)ds, for an appropriate Borel measurable kernel (q s ) s∈[0,T ] , meaning that for any s ∈ [0, T ], q s is a finite positive Borel measure on A, and the map [0, T ] s −→ q s is Borel measurable, when the space of measures on A is endowed with the topology of weak convergence. We then define the following canonical space Ω := C 2 × A, whose canonical process is denoted by (S, I, Λ), in the sense that S t s, ι, q := s(t), I t s, ι, q := ι(t), Λ s, ι, q := q, ∀ t, s, ι, q ∈ [0, T ] × Ω. We let F be the Borel σ-algebra on Ω, and F := (F t ) t∈[0,T ] be the natural filtration of the canonical process where for any (s, Υ) Recall that in this framework F = F T . Let M be the set of probability measures on (Ω, F T ). For any P ∈ M, we let N P be the collection of all P-null sets, that is to say where we recall that 2 Ω represents the set of all subsets of Ω, and we let (ii) P (S 0 , I 0 ) = (s 0 , i 0 ) = 1; (iii) with P-probability 1, the canonical process Λ is of the form δ φ· (dv) for some Borel function φ : [0, T ] −→ A, where as usual, for any a ∈ A, δ a is the Dirac mass at a. We can follow Bichteler [18] , or Neufeld and Nutz [80, Proposition 6.6] to define a pathwise version of the density of the quadratic variation of S, denoted by σ : [0, T ] × Ω −→ R, by 10 Notice that the initial value of r 0 of R, which appears in the SIR version of the model, is irrelevant at this stage. Lévy's characterisation of Brownian motion ensures that the process 11 is an (F P , P)-Brownian motion for any P ∈ P. For any P ∈ P, we denote by A o (P) the set of F-predictable and A-valued process α := (α s ) s∈[0,T ] such that, P-a.s. We recall that the term λ ≥ 0 denotes the birth rate, the parameter µ ≥ 0 is the natural death rate in the population (susceptible and infected), γ ≥ 0 is the death rate inside the infected population. The parameters ν and ρ correspond to recovery rates, depending on whether we are considering a SIS or a SIR model, see the remark below for more details. (i) if ρ = 0, the constant ν ≥ 0 is the rate of recovery for infected people, going back in the class of susceptible. This case corresponds to the classical SIS model whose dynamics are described by the system (2.3); (ii) if ν = 0, the constant ρ ≥ 0 is the recovery rate for infected individual, going into a class of recovered people, whose proportion is denoted by R. This case corresponds to the classical SIR model described by (2.4) . It can be noted that our model, which results from a mixing of the SIS and SIR models, can be interpreted as an SIR model with partial immunisation, in the sense that only a part of the population develops antibodies for the disease after being infected. Thus, a proportion ρ of the infected moves to the class R, and can no longer be infected. Conversely, the proportion of the infected who do not develop antibodies reverts to the class S, and can therefore contract the disease again. This resulting model is similar to the one developed by Zhang, Wu, Zhao, Su, and Choi [111] and called SISRS. This type of model seems in fact well suited to model epidemics related to new viruses, such as the COVID-19, when the immunity of infected persons has not yet been proved. Before pursuing, we need a bit more notations, and will consider the following sets as well as, for any α ∈ A o , P(α) := P ∈ P : α ∈ A o (P) . We will require that the controls chosen by the government lead to only one weak solution to Equation (4.2) , and are such that the processes S and I remain non-negative. We will therefore concentrate our attention to the set A of admissible controls defined by Notice that for any α ∈ A, we have σ t = σS t I t α t , dP α ⊗ dt-a.e. 11 More precisely, one should first use the result of Stroock and Varadhan [99, Theorem 4.5.2] to obtain that on an enlargement of (Ω, F T ), there is for any P ∈ P, a Brownian motion W P , and an F-predictable process, A-valued process α P such that The result for W is then immediate. Notice in addition that since W is defined as a stochastic integral, it should also depend on explicitly on P. We can however use Nutz [ Notice finally that for any α ∈ A, we have We thus deduce, using the positivity of S and I, that where for all (t, s, i) This result proves in particular that S and I are actually P α -almost surely bounded, for any α ∈ A. Moreover, if (s 0 , i 0 ) ∈ (R + ) 2 , then for all t ∈ [0, T ], both S t and I t are (strictly) positive. Note that in the SIR model described by the system (2.4), we have, for all t ∈ [0, T ], so that R t depends only on the observation of I s for s ≤ t. In addition to that The basic model from (4.2) takes into account the testing policy put into place by the government, but ignores so far the interacting behaviour of the population. We model this through an additional control process chosen by the population. More precisely, we fix some constant β max > 0 representing the maximum rate of interaction that can be considered, and we define B := [0, β max ]. Let B be the set of all F-predictable and B-valued processes. Given a testing policy α ∈ A implemented by the government, notice that the following stochastic exponential , is an (F, P α )-martingale, given that the process β/(σ √ α) takes values in 0, β max /(σ √ ε) , P α -a.s. Therefore, for any (α, β) ∈ A × B, we can define a probability measure P α,β on (Ω, F), equivalent to P α , by Using Girsanov's theorem, we know that the process is an (F, P α,β )-Brownian motion, and we have (4.6) At time 0, the government informs the population about its testing policy α ∈ A, as well as its fine policy χ, which for now will be an F T -measurable and R-valued random variable (a set we denote by C). The population solves the following optimal control problem The interpretation of the functions u and U is detailed in Section 2.3.1, where the population's problem was informally defined. For any (α, χ) ∈ A × C, we recall that we denoted by B (α, χ) the set of optimal controls for V A 0 (α, χ), that is to say We require minimal integrability assumptions at this stage, and insist that there exists some p > 1 such that Remark 4.4. Notice that since for any α ∈ A the Radon-Nykodým density dP α,β /dP α has moments of any order under P α (since any β ∈ B is bounded and any α ∈ A is bounded and bounded away from 0), a simple application of Hölder's inequality ensures that (4.9) implies that for any p ∈ (1, p) and any β ∈ B Recall that the government can only implement policies (α, χ) ∈ A × C such that V A 0 (α, χ) ≥ v, where the minimal utility v ∈ R is given. We denote the subset of A × C satisfying this constraint and Equation (4.9) by Ξ. In line with the informal reasoning developed in Section 2.3.4, the government aims at minimising the number of infected people until the end of the lockdown period, and we write rigorously its minimisation problem as Since the fine policy χ is an F T -measurable random variable, where F is the filtration generated by the process (S, I), we should expect that in general V A 0 (α, χ) = v(0, s 0 , i 0 ), where the map v : [0, T ] × C 2 −→ R satisfies an informal Hamilton Jacobi Bellman (HJB for short) equation, and as such has the dynamic In particular, defining Z := Z s − Z i , we should have Given the supremum appearing above, the following assumption will be useful for us. where, in this case, the population's Hamiltonian H : Since the dynamics of R is deterministic and not controlled, a simplification occurs between the additional part of the Hamiltonian (ρi − µr) z and the integral with respect to dR, which leads to the same form for the utility function as previously mentioned, i.e., Equation (4.11). Let us start this section by defining two useful spaces. For any α ∈ A, and any m ∈ N , we define S m (P α ) and H m (P α ) as respectively the sets of R-valued, F P α + -adapted continuous processes Y such that Y S m (P α ) < ∞, and the set of Theorem 4.7. Let (α, χ) ∈ Ξ. There exists a unique F P α + 0 -measurable random variable Y 0 and a unique Z ∈ H p (P α ) such that Proof. Fix (α, χ) ∈ Ξ as in the statement of the theorem. Let us consider the solution (Y, Z) of the following BSDE (4.14) Since χ ∈ C, u is continuous, I and S are bounded, and B is a compact set, it is immediate this BSDE is well-posed and admits a unique solution (Y, Z) ∈ S p (P α )×H p (P α ) (in a more general context, one may refer for instance to Bouchard, Possamaï, Tan, and Zhou [20, Theorem 4.1] ). Therefore, using the dynamic of I under P α , given by Equation (4.2), as well as the definition of β , and letting t = 0, we obtain that (4.13) is satisfied. Next, using this representation for U (χ), notice that for any β ∈ B, we have where we used the fact that that Z ∈ H p (P α ), and that the process is continuous, and both an (F P α , P α )-and an (F P α + , P α )-martingale (see for instance Neufeld and Nutz [80, Proposition 2.2] ), so that for any β ∈ B The previous inequality implies that Moreover, thanks to Assumption 4.5, equality is achieved if and only if we choose the control β . This shows that In the previous result, the fact that Equation (4.13) holds with an F P α + 0 -measurable random variable and not a constant is somewhat annoying. The next lemma shows that we can actually have the representation with a constant without loss of generality. Lemma 4.8. Let α ∈ A, and fix an F P α + 0 -measurable random variable Y 0 and some Z ∈ H p (P α ). Define the following contracts Then Proof. The equalities for (α, χ) are immediate from Theorem 4.7. For (α, χ ), we have, using the fact that Z ∈ H p (P α ), and thus Z ∈ H q (P α,β ) for any β ∈ B and any q ∈ (1, p) Since the equality is attained if and only if we choose β = β , this ends the proof. We introduce the class Ξ of contracts defined by all pairs α, U (−1) (−Y y0,Z T ) with α ∈ A, and Y y0,Z a process given, P α -a.s., for all t ∈ [0, T ] by with Z ∈ H p (P α ) and y 0 ∈ [v, ∞). We also denote for simplicity P ,α,Z := P α,b (S·,I·,Z·) . Lemma 4.9. The problem of the government given by (4.10) can be rewritten Proof. From Theorem 4.7 and Lemma 4.8, we know that Ξ ⊂ Ξ. To prover the reverse inclusion, let us now consider a pair α, −U (−1) Y y0,Z T ∈ Ξ. We simply need to ensure that −U (−1) Y y0,Z T ∈ C. We have, using the fact that u is continuous, B is compact, α is bounded below by ε, and S and I are bounded, that there exists some constant C > 0, which may change value from ligne to ligne, such that where we used Burkholder-Davis-Gundy's inequality and Cauchy-Schwarz's inequality. This proves the reverse inclusion and thus that Ξ = Ξ. Next, we use Lemma 4.8 to realise that B α, To conclude, it is enough to notice that the following map is non-increasing. Lemma 4.9 states that the problem of the government can be can be reduced to a more standard stochastic control problem. However, in the current formulation, one of the three state variables, namely Y , is considered in the strong formulation, while the other state variables S and I are considered in weak formulation. Indeed, the variable Y is indexed by the control Z, while the control (α, Z) only impacts the distribution of S and I through P ,α,Z . As highlighted by Cvitanić and Zhang [27, Remark 5.1.3] , it makes little sense to consider a control problem of this form directly. Therefore, contrary to what is usually done in principal-agent problems (see, e.g., [29] ), we decided to adopt the weak formulation to rigorously write the problem of the principal, since this is the formulation which makes sense for the agent's problem. We will thus formulate it below, for the sake of thoroughness. 12 Let V := R × A and consider the sets V as we defined A in Section 4.1.1. The intuition is that the principal's problem depends only on time and on the state variable X = (S, I, Y ). Following the same methodology used for the agent's problem, to properly define the weak formulation of the principal's problem, we are led to consider the following canonical space We let G be the Borel σ-algebra on Ω P , and G := (G T ) t∈[0,T ] the natural filtration of (S, I, Y, Λ P ), defined in the same way as F in the previous canonical space Ω (see Section 4.1). Let then M P be the set of probability measures on (Ω P , G T ). For any P ∈ M P , we can define G P the P-augmentation of G, its right limit G P+ , as well as F Π := (F Π t ) t∈[0,T ] the Π-universal completion of F for any subset Π ⊂ M P . The drift and volatility functions for the process X are now defined for any (t, s, i, z, a) where u (t, s, i, z, a) := u (t, b (t, s, i, z, a) , i), for all (t, s, i, z) B P (r, S r , I r , v) · ∇ϕ P (X r ) + 1 2 Tr D 2 ϕ P (X r ) Σ P (Σ P ) (r, S r , I r , v) Λ P (dr, dv). In the spirit of Definition 4.1 for P ⊂ M, we define the subset Q ⊂ M P as the one consisting of all P ∈ M P such that (iii) with P-probability 1, the canonical process Λ P is of the form δ φ· (dv) for some Borel function φ : [0, T ] −→ V . Still following the line of Section 4.1, we know that for any P ∈ Q, we can define a (G Q , P)-Brownian motion W P . We then denote by V o (P) the set of G-predictable and V -valued process (Z, α) such that, P-a.s. and for all t ∈ [0, T ], (4.17) Thank to the analysis conducted in the previous subsection, the problem of the government given by (4.10) can now be written rigorously in weak formulation where S 3 represents the set of 3 × 3 symmetric positive matrices with real entries. More explicitly, the Hamiltonian can be written as follows We are then led to consider the following HJB equation, for all t ∈ [0, T ] and x = (s, i, y) ∈ R 3 : with terminal condition v(T, x) := −U (−1) (y), and where the natural domain over which the above PDE must be solved is 13 O := (t, s, i, y) ∈ [0, T ) × R 2 + × R : 0 < s + i < F (t, s 0 , i 0 ) , recalling that F is defined by (4.4). where v P should be understood as the unique viscosity solution, in an appropriate class of functions, of the PDE (4.20). Obtaining further regularity results is by far more challenging. Indeed, it is a second-order, fully non-linear, parabolic PDE, which is clearly not uniformly elliptic, the corresponding diffusion matrix being degenerate. This makes the question of proving the existence of an optimal contract a very complicated one, which is clearly outside the scope of our study. As a sanity check though, we recall that ε-optimal contracts always exist, and can be indeed approximated numerically. See for instance Kharroubi, Lim, and Mastrolia [66] for an explicit construction of such ε-optimal contracts in a particular case dealing with the stochastic logistic equation. As already mentioned, the first-best case corresponds to the case where the government can enforce whichever interaction rate β ∈ B it desires (in addition to a contract (α, χ) ∈ A × C), and simply has to satisfy the participation constraint of the population. In order to find the optimal interaction rate in this scenario, as well as the optimal contract, one has to solve the government's problem defined by (2.14). The simplest way to take into account the inequality constraint in the definition of V P,FB 0 is to introduce the associated Lagrangian. By strong duality, we then have First, by concavity of U , it is immediate that for any given Lagrange multiplier > 0, the optimal tax is constant and given by (2.19) . Then, using the definition of V 0 ( ) for any > 0 in (2.20), we have: Note that V 0 ( ) is the value function of a standard stochastic control problem. Therefore, we expect to have where the Hamiltonian is defined, for t ∈ [0, T ], (s, i) ∈ (R + ) 2 , p := (p 1 , p 2 ) ∈ R 2 and M ∈ S 2 by To simplify, let us consider separable utilities with the forms (2.17). We focus on the maximisation of the Hamiltonian H with respect to b ∈ B, to obtain the optimal interaction rate β . The maximiser b is defined by recalling that b • is defined by (2.18). In particular, for a given testing policy α ∈ A and a Lagrange multiplier > 0, the optimal interaction rate in this case is given for all t ∈ [0, T ] by β t = b S t , I t , ∂v (t, S t , I t ), α t . We thus obtain where in addition for a ∈ A, u (t, s, i, p, a) := u (t, b (s, i, p, a) , i). Then, the optimal testing policy is given for all t ∈ [0, T ] by α t := a (t, S t , I t , ∂v (t, S t , I t ), D 2 v (t, S t , I t )), where a : [0, T ] × (R + ) 2 × R 2 × S 2 −→ A is the maximiser of the previous Hamiltonian on a ∈ A, if it exists. 13 The boundary of the domain cannot be reached by the processes S an I, which is why it not necessary to specify a boundary condition there. Notice though that the upper bound can formally only be attained when I is constantly 0, in which case S becomes deterministic, and the government best choice for α is clearly 1, and its choice of Z becomes irrelevant. In such a situation, we would immediately have V P 0 = v. We now focus on the SEIR/S (Susceptible-Exposed-Infected-Recovered or Susceptible) compartment model. Again, the class S represents the 'Susceptible' and the class I represents the 'Infected' and infectious. The SEIR and SEIS models are used to describe epidemics in which individuals are not directly contagious after contracting the disease. This therefore involves a fourth class, namely E, representing the 'Exposed', i.e., individuals who have contracted the disease but are not yet infectious. With this in mind, we denote by ι the rate at which an exposed person becomes infectious, which is assumed to be a fixed non-negative constant. Therefore, during the epidemic, each individual can be either 'Susceptible' or 'Exposed' or 'Infected' or in 'Recovery', and (S t , E t , I t , R t ) denotes the proportion of each category at time t ≥ 0. The difference between SEIS and SEIR models is embedded into the immunity toward the disease: for SEIR models, it is assumed that the immunity is permanent, i.e., after being infected, an individual goes and stays in the class R, whereas for SEIS models, there is no immunity, i.e., infected individual come back in the susceptible class at rate ν ≥ 0, similarly to SIS models. As in the previously described SIR model, we also take into account the demographic dynamics of the population, through the parameters λ, µ and γ. To sum up, the epidemic dynamics is represented in Figure 18 . Recovery Similarly to the previous models, we consider that the dynamic of the epidemic is subject to a noise in the estimation of the proportion of susceptible and infected individuals. Inspired by the stochastic model in Mummert and Otunuga [77, Equation ( 3)], we therefore consider that the dynamics of the epidemic is given by the following system Note that the proportion I of infected and infectious is also uncertain, but only through its dependence on E and the proportion R of recovery is uncertain only through its dependence on I. More precisely, we assume that there is no uncertainty on both the recovery rate ρ, the rate ι at which infected people becomes infectious and the (potentially) rate ν at which an individual loses immunity, implying that if the proportion of exposed individual is perfectly known, the proportion of infected is also known without uncertainty and consequently the proportion of recovery is also certainly known. Again this modelling choice is consistent with most stochastic SEIRS models, and emphasises that the major uncertainty in the current epidemic is related to the non-negligible proportion of (nearly) asymptomatic individuals. Indeed, an asymptomatic individual may be misclassified as susceptible or exposed. We will now give (informally) the optimisation problems faced by both the population and the government, the rigorous treatment can be done following the lines of Section 4. The most important change compared to SIS/SIR models is that the criteria should now depend on the sum E + I, representing the proportion of the population having contracted the disease, rather than just the proportion I of infectious people. Unless otherwise stated, the notations are those of Section 4. The problem of the population is now while that of the government becomes Notice that in the cost function k, we did not replace I by I + E. This is due to the fact that this cost should scale with the volatility of I + E (see the discussion in Example 2.2), which is still σ 2 α · (S · I · ) 2 in the model (5.1). for b ∈ B. Given the supremum appearing above, and similarly to Assumption 4.5, we make the following assumption. Therefore, a straightforward adaption of our earlier arguments will show that every admissible contract will take the form χ := −U (−1) (Y T ) where where β t := b (t, S t , E t , I t , Z t , α t ) for all t ∈ [0, T ] is the optimal control of the population. It thus remain to solve the government's problem. Unlike in the previous SIS/SIR models, there are now four state variables for the government's problem, namely (S, E, I, Y ), whose dynamic under the optimal effort of the population is as follows recalling that F is defined by (4.4). Solving numerically (5.8) is really more challenging since it increases the dimension of the problem. A numerical investigation seems to be complicated as far as we now, and we left these numerical issues for future researches. There are of course plethora of generalisations of the models we have considered so far. For instance, in SEIRS (or also SIRS) models, the immunity is temporary, i.e. people in the class R may come back into the class S at rate ν. Using a similar stochastic extension of this model, it is straightforward that all our results extend, mutatis mutandis, to this case as well, albeit with one important difference: the control problem faced by the government now has 5 states variables, namely (S, E, I, R, Y ). Even more generally, our approach can readily be adapted to compartmental models considering additional classes: for instance the SIDARTHE ('Susceptible' (S), 'Infected' (I), 'Diagnosed' (D), 'Ailing' (A), 'Recognised' (R), 'Threatened' (T), 'Healed' (H) and 'Extinct' (E)) model investigated in Giordano, Blanchini, Bruno, Colaneri, Di Filippo, Di Matteo, and Colaneri [44] for COVID-19. Of course the price to pay is that the number of state variables in the government's problem will increase with the number of compartments, and numerical procedures to solve the HJB equation will become more delicate to implement, and could be based on neural networks. Similar to Section 3, we present in this appendix the numerical results obtained when considering a SIS compartmental model, whose dynamic is given by (2.3), or equivalently by (2.5) with ρ = 0. We take the same parameters as for the SIR case to model the preferences of the government and the population, i.e. the parameters given in Table 1 , except for β max = 0.5. To model the SIS dynamic, we consider a different set of parameters (see Table 3 ), in order to obtain the same shape for the proportion of infected at the beginning of the epidemic in both cases of an SIR and SIS dynamics. This choice is made to model the fact that, at the beginning of a relatively unknown epidemic such as that of COVID-19, the proportion of infected people is observed (with noise), but the authorities do not necessarily know whether this disease allows immunity to be acquired. Table 3 : Set of parameters for the simulation of SIS dynamics To solve the benchmark case, we follow the method described in Section 3.3, although we choose here a number of time steps equal to 600, a time step discretisation equal to 0.0025, a linear interpolator, and the optimal command β used to maximise the Hamiltonian is discretised with 200 points given a step discretisation of 0.005. Once the PDE is solved, a simulator is used in forward using the optimal command and giving the dynamic of the proportion (S, I). As for the numerical resolution of the benchmark case for the SIR model, we implement two versions of the resolution, with variables (S, I) or (S, S + I). Optimal effort % of infected Simulation 1 Simulation 2 Figure 19 : Two simulations of the SIS in the benchmark case Comparison between the two methods. As the numerical results obtained in the benchmark case when the epidemic dynamic is given by a SIS model have the same characteristics as with the SIR dynamic, we describe the graphs only briefly below. Figure 19 . As in the SIR case, the trajectories of β obtained through the two aforementioned resolutions are rather close, and the corresponding trajectories for I coincide. We plot 100 trajectories of the optimal interaction rate β , the proportion of susceptible S, as well as the proportion of infected I. The population's behaviour is similar to the one obtained in the SIR model: first the population behaves as usual, then begins to reduce β, which finally goes back to its usual values as the epidemic disappears. Once again that the population's fear of infection is not sufficient to prevent the epidemic. As for the benchmark case, the numerical method to obtain the optimal lockdown policy is similar to the one used in the case of an SIR dynamics. We only recall here the key points of the method. We first solve (2.22) with the semi-Lagrangian scheme, taking v given by (3.2) and estimated with a Monte Carlo method, and by using an Euler scheme with a time-discretisation of 600 time steps and 10 6 trajectories. The estimated value for v is −0.0878063. We then take a step discretisation for the grid in (s, i, y) corresponding to (0.0025, 0.0025, 0.005), leading to a number of meshes at maturity equal to 150 × 120 × 1200. We consider the bounded set of values [−10, 10] for the control Z, and a step discretisation equal to 0.5. The graphs obtained are briefly described below. We present some trajectories of the optimal controls β and Z, as well as the resulting proportion I of infected individuals. Figure 22 . We compare on some simulations the optimal transmission rate obtained with the contract to the one obtained in the benchmark case. We see that the tax succeeds in reducing significantly the interaction rate compared to the no-tax policy case. Comparison with the benchmark case. Due to the larger terminal time horizon, the computation time is particularly significant. To reduce it, the discretisation used to find the optimal control Z is reduced to 1. The resulting graphs are briefly described below. We present trajectories of the optimal controls β, α and Z, and the resulting proportion I of infected. We compare on simulations the optimal proportion of infected with the two previous cases (benchmark case and only tax policy): with testing, the epidemic is now totally under control. We present simulations of the optimal effective transmission rate in this case, and compare it to the optimal β obtained in the benchmark case and without testing policy. Figure 27 . We present simulations of the optimal α: its quick variations explain the swift changes in the effective β. Optimal contact rate β Optimal testing policy α Optimal control Z Proportion I of infected An optimal isolation policy for an epidemic Optimal electricity demand response contracting with responsiveness incentives A principal-agent approach to study capacity remuneration mechanisms An introduction to stochastic epidemic models Comparison of deterministic and stochastic SIS and SIR models in discrete time A simple planning problem for COVID-19 lockdown Disability-adjusted life years: a critical review Population biology of infectious diseases: part I How will country-based mitigation measures influence the course of the COVID-19 epidemic? The Lancet Un modèle mathématique des débuts de l'épidémie de coronavirus en France A simple stochastic epidemic The mathematical theory of infectious diseases and its applications Some evolutionary stochastic processes Deterministic and stochastic models for recurrent epidemics Optimal control of deterministic epidemics Stability of epidemic model with time delays influenced by stochastic perturbations Essai d'une nouvelle analyse de la mortalité causée par la petite vérole, et des avantages de l'inoculation pour la prévenir Stochastic integration and L p -theory of semimartingales Contract theory A unified approach to a priori estimates for supersolutions of BSDEs in general filtrations. Annales de l'institut Henri Poincaré An approximation scheme for the optimal control of diffusion processes Finite-state contract theory with a principal and a field of agents Contact tracing mobile apps for COVID-19: privacy considerations and related trade-offs COVID-19: extending or relaxing distancing control measures. The Lancet Public Health On the LambertW function Asset pricing under optimal contracts Contract theory in continuous-time models Moral hazard in dynamic risk management Dynamic programming approach to principal-agent problems COVID-19-new insights on a rapidly changing epidemic Optimal COVID-19 epidemic control until vaccine deployment Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model Optimal make-take fees for market making regulation Capacities, measurable selection and dynamic programming part II: application in stochastic control problems Contracting theory with competitive interacting agents Mean-field moral hazard for optimal energy demand response management A tale of a principal and many many agents Contact rate epidemic control of COVID-19: an equilibrium view Adaptive human behavior in epidemiological models Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand The effect of stay-at-home orders on COVID-19 infections in the United States Dynamics of a stochastic SIS epidemic model with nonlinear incidence rates STochastic OPTimization library in C++. HAL preprint hal-01361291 Modelling the covid-19 epidemic and implementation of population-wide interventions in italy A model of incentive compatibility under moral hazard in livestock disease outbreak response Livestock disease indemnity design when moral hazard is followed by adverse selection A stochastic differential equation SIS epidemic model On the statistical measure of infectiousness Optimal quarantine strategies for COVID-19 control models A model for communicable disease control Optimum control of epidemics The Milroy lectures on epidemic disease in England -the evidence of variability and of persistency of type Optimal control of epidemics with limited resources Trauma does not quarantine: violence during the COVID-19 pandemic Aggregation and linearity in the provision of intertemporal incentives Continuous-time principal-agent problem in degenerate systems On the responsible use of digital data to tackle the COVID-19 pandemic A stochastic model for the optimal control of epidemics and pest populations Asymptotic behavior of global positive solution to a stochastic SIR model Thucydes translated into English, to which is prefixed an essay on inscriptions and a note on the geography of Thucydides, volume I How to run a campaign: optimal control of SIS and SIR information epidemics Beyond just" flattening the curve": optimal control of epidemics with purely non-pharmaceutical interventions Deterministic and stochastic epidemics in closed populations A contribution to the mathematical theory of epidemics Optimal control of an SIR epidemic through finite-time non-pharmaceutical intervention Regulation of renewable resource exploitation On the extinction of the S-I-S stochastic logistic epidemic The theory of incentives: the principal-agent model Optimal control applied to biological models. Mathematical and computational Biology series Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Second order backward SDE with random terminal time Random horizon principal-agent problem Principal-agent problem with common agency without communication Applications of mathematics to medical problems The dynamics of crowd infection On the optimal control of a deterministic epidemic Parameter identification for a stochastic SEIRS epidemic model: case study influenza The quasi-stationary distribution of the closed endemic SIS model On the quasi-stationary distribution of the stochastic logistic epidemic Measurability of semimartingale characteristics with respect to the probability law Pathwise construction of stochastic integrals Information technology-based tracing strategy in response to COVID-19 in South Korea -privacy controversies The optimal COVID-19 quarantine and testing policies An explicit optimal intervention policy for a deterministic epidemic model Stochastic control for a class of nonlinear kernels and applications. The Annals of Probability Privacy-preserving contact tracing of covid-19 patients Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions The prevention of malaria Some a priori pathometric equations An application of the theory of probabilities to the study of a priori pathometry -part I The economics of contracts: a primer Quantitative guidelines for communicable disease control programs A continuous-time version of the principal-agent problem Calculating QALYs, comparing QALY and DALY calculations The first-order approach to the continuous-time principal-agent problem with exponential utility Dynamic control of modern, network-based epidemic models Optimal control of some simple deterministic epidemic models The interpretation of periodicity in disease prevalence Multidimensional diffusion processes Some models in epidemic control The benefits and costs of using social distancing to flatten the curve for COVID-19 Susceptible-infected-recovered (SIR) dynamics of COVID-19 and economic impact Stability of a stochastic SIR system Incentive systems under ex post moral hazard to control outbreaks of classical swine fever in the Netherlands Some non-monotone schemes for time dependent Hamilton-Jacobi-Bellman equations in stochastic control On the asymptotic behavior of the stochastic and deterministic models of an epidemic Optimal isolation policies for deterministic and stochastic epidemics Can we contain the COVID-19 outbreak with the same measures as for SARS? On dynamic principal-agent problems in continuous time Where now for saving lives? Law and Contemporary Problems Epidemic spreading on a complex network with partial immunization