key: cord-0193376-1w1iiuo4 authors: Zugarini, Andrea; Meloni, Enrico; Betti, Alessandro; Panizza, Andrea; Corneli, Marco; Gori, Marco title: An Optimal Control Approach to Learning in SIDARTHE Epidemic model date: 2020-10-28 journal: nan DOI: nan sha: 042d1f4a9f8437845384d05c8a4613986ad20df3 doc_id: 193376 cord_uid: 1w1iiuo4 The COVID-19 outbreak has stimulated the interest in the proposal of novel epidemiological models to predict the course of the epidemic so as to help planning effective control strategies. In particular, in order to properly interpret the available data, it has become clear that one must go beyond most classic epidemiological models and consider models that, like the recently proposed SIDARTHE, offer a richer description of the stages of infection. The problem of learning the parameters of these models is of crucial importance especially when assuming that they are time-variant, which further enriches their effectiveness. In this paper we propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data. We formulate the problem in terms of a functional risk that depends on the learning variables through the solutions of a dynamic system. The resulting variational problem is then solved by using a gradient flow on a suitable, regularized functional. We forecast the epidemic evolution in Italy and France. Results indicate that the model provides reliable and challenging predictions over all available data as well as the fundamental role of the chosen strategy on the time-variant parameters. The novel coronavirus that emerged in Wuhan, China, at the end of 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] , quickly spread in China and then to the rest of the world. As of September 30th 2020, at least 215 countries have been impacted, with over 33 millions detected cases, and over 1 million deaths 1 . Huge efforts are underway to contain the pandemic. In absence of specific vaccines or effective drugs against COVID-19, the disease caused by SARS-CoV-2, governments have resorted to non-pharmaceutical interventions to prevent its spread, such as social distancing, mask wearing, isolation of the infected and their contacts, and in many cases national lockdowns. ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. Zugarini In the meantime, many researchers have focused their efforts on analyzing and forecasting the spread of COVID-19 [2] , [3] , [4] , [5] , [6] . Predicting the effect of interventions, the evolution of the size of the outbreak, or the expected date for peak of active cases, are all results of paramount importance, that help policy makers to take the best decisions in the face of uncertainty. In order to obtain these results, a widely used class of epidemiological models is that of compartmental models, such as the classical Susceptible-Infectious-Recovered (SIR) [7] and the Susceptible-Exposed-Infectious-Recovered (SEIR) models [8] . Compartmental models partition the population in disjoint groups, and, under the assumption of a homogeneous and uniformly mixed population [9] , they model the dynamics of each group as a system of constant-coefficient nonlinear Ordinary Differential Equations (ODE). The SIR and SEIR models, as well as variants such as SIDR [10] and SEIRDC [11] , have been widely used to model the COVID-19 pandemic [2] , [12] , [13] , fitting the model parameters to the available public data. The mathematical properties of these models, such as the existence of a threshold phenomenon, the possibility to estimate the final size of the epidemic, the maximum number of infectious individuals at a given time and so on, are well-known and described for example in [9] , [8] , [14] . An issue with fitting the standard SIR and SEIR models to publicly available data is the existence of a large fraction of undetected but infectious cases. As discussed in [6] and [4] , these undetected infections can often go unrecognized due to mildness of symptoms or lack thereof, thus exposing a far greater portion of the population to the virus than it would otherwise occur. In order to face the transmission due to undetected cases, in [4] , the authors consider a new epidemiological model, SIDARTHE, which extends the classical SIR model by discriminating between detected and undetected cases of infection, and different severity of illness. The complex dynamic of the model is well suited for forecasting multiple aspects of the infection spread, and it achieves very interesting performance on predicting the pandemic evolution in both the Italian and French territories. Typically, all the compartmental models assume rate coefficients to be constant in time. However, this assumption yields quite poor approximations over large observation windows during an outbreak. Clearly, the diffusion of a virus depends on multiple aspects that can change over time. A striking example is the case of national lock-downs aimed at dramatically containing the spread of the disease. In [4] , this issue is dealt with by assigning piece-wise constant coefficients in correspondence of lockdown policies changes, while in [15] , the arXiv:2010.14878v1 [cs.LG] 28 Oct 2020 authors model the coefficients as constant values separated by three linear transitions. However, such solutions require either precise knowledge of when and how the scenario changes or at least fixing a priori the number of breakpoints. This becomes unfeasible when there are multiple interacting phenomena such as local lockdowns, virus mutations, variations of treatments, therapies or infection screening. In this paper, we propose an approach for learning time variant parameters of dynamic compartmental models and present an approach that nicely reflects the spirit of most machine learning algorithms. We formulate the learning process within the framework of optimal control theory [16] , [17] , [18] . Then we attack the problem of parameter estimation by using a gradient flow algorithm that, throughout the paper, is referred to as GF . The algorithm, which alternates steps of ODE solutions with gradient estimation, is shown to be very effective thanks to an appropriate regularization of the model parameters which properly identifies their weight along the temporal window of simulation. In particular, we learn time-variant coefficients of SIDARTHE, but clearly the algorithm is suitable for any other compartmental model whenever supervised data is available. This paper is organized as follows. After a brief review of SIDARTHE (Section II), we introduce the proposed learning framework in Section III, and report the experiments in Section IV. Finally, some conclusions are drawn in Section V. In order to define the terminology and the notations that we will use in the remainder of the paper, in this section, we give a brief review of SIDARTHE [4] . The model is a dynamical system described by eight ordinary differential equations in the variables Each of these quantities represents the population of a different compartment of the model at a certain temporal instant t. In particular each temporal instant t is mapped to: The problem is then formally defined in terms of the Cauchy problem for the following ODE system 2 where α, β, γ, δ, ε, ζ, η, θ, κ, λ, µ, ν, ξ, ρ, σ, φ, χ, τ, (3) are the rates that specify the velocity of the flows between the compartments of the model, with the initial conditions In particular (see also Fig. 1 ) we have that α, β, γ and δ are the infection rates between S and I, D, A and R respectively. Notice that these rates could be compared with the infection rate of the plain SIR model (the term in front of the bilinear term in the update rules of the susceptible and the infected). The coefficients ε and θ govern the rate at which the asymptomatic and symptomatic undetected infected I and A are detected, while ζ and η are responsible for the transition between the asymptomatic and symptomatic classes (namely from I and D to A and R). The quantities µ and ν control the flow from the symptomatic infected detected R and the symptomatic infected undetected A to the acutely symptomatic infected class T that, in turn, is connected to the set of deceased individuals E through the rate τ . We also extend the SIDARTHE model presented in [4] with connections from A and R to E, namely φ and χ, to detect deceases outside Intensive Care Units (ICUs), as the ones occurred in elderly care facilities. Finally, κ, λ, ξ, ρ and σ represent the recovery rates. Since the flows of the population through the eight compartments are directed (indeed the graph in Fig. 1 is a dag) all the rates must be non-negative. The constants S 0 , I 0 , D 0 , A 0 , R 0 , T 0 , H 0 and E 0 in Eq. (4) are assumed to be real non-negative values and coupled with the SIDARTHE differential equations they specify a Cauchy problem. Notice that if I 0 = D 0 = A 0 = R 0 ≡ 0 then the infection cannot begin. From Eq. (2) it is also immediate to see that the total population is conserved sinceṠ +İ +Ḋ + A +Ṙ +Ṫ +Ḣ +Ė = 0. As it is argued in [4] an appropriate definition of the basic reproduction number in this model is Let D(·, u, z 0 ) be the solution for the variable D of Eq. (2) when the coefficients are the components of the function u, and the initial values of the compartments are specified by the values of z 0 ∈ R 8 . In a similar manner let us also define R, T and E so that each of such quantities, considered as functions of all their arguments, maps which, roughly speaking, represents the number of diagnosed individuals who recovered when we initialize Eq. (2) with z 0 and for a given choice u of the various rates. The quantities D, R, T , H d and E are the basic ingredients to define the risk that we will use to define the learning task. Indeed let us define ϕ : [0, T ]×X → R the following quadratic error 3 Here we assume that X is Hilbert. whereD,R,T ,Ĥ andÊ are the observed time series and e D , e R , e T , e H and e E are positive constants. with m > 0. Notice that this is an integral of a Lagrangian that is non-local in time since ϕ depends on the whole trajectory of the variables u and not just on the their values at time t. Then, the learning of u corresponds to the following optimization problem min This problem resembles identification and optimal control problems that are associated with the minimization of F , that can be tackled by means of the theory of Lagrange's multipliers [16] . In this case the states z are promoted to variables of the problem, so that the quadratic error ϕ can be written directly in terms of the components of z; for example the term (D(t, u) −D) 2 → (z 3 −D) 2 . The minimization problem then is solved under the constrained dynamic of z given by the SIDARTHE system of the formż(t) = Φ(z(t), u(t)) (Φ here can be deduced by the right-hand-side of the differential equation in (2)). Then the problem Eq. (8) can be recast into the following form: where Y is an appropriate functional space that contains functions z satisfying the initial condition z(0) = z 0 and Following this approach, the solution is usually achieved by imposing the stationarity condition on (9) , which yields the Euler-Lagrange differential equations with appropriate boundary conditions over the temporal variable t. In this problem, however, the presence of the additional non-locality due to (10) that persists also in the reformulation (9) requires, for instance, to regard H d as another variable and to add the differential equation forḢ d , that can be readily be inferred from Eq (10), to the constraintsż = Φ(z, u). While the parameter estimation based on Eq. (9) constitutes by itself a very interesting and promising research direction, in this paper, we propose to pursue the minimization of (7) through a more direct approach, i.e. by approximating the gradient flow u = −∇F (see [19] ) by an explicit method that updates the trajectories t → u(t) starting from a fixed initial configuration u 0 ∈ X. For this reason we are referring to the learning approach proposed in this paper as GF (Gradient Flow). In practice the proposed learning algorithm is an implementation of the following update rule where ∇F is (when it exists) the Fréchet derivative (see [20] ) of F and u 0 ∈ X is assigned. The term |u| 2 /2 in Eq. (7) is extremely important for the well-posedness of the learning problem: the minimization of the mean quadratic loss alone could in principle lead to highly irregular solutions that have a low degree of generalization power. Moreover the term u L 2 gives coerciveness to the whole functional making it more suitable to be the objective of a minimization problem. This term yields a parsimonious solution where abrupt changes are penalized. Due to the presence of |u| 2 , the stationarity condition on the functional in Eq. (7) also suggests that the derivatives of stationary points of F on the boundaries t = 0 and t = T must be vanishing, thus offering an interesting consistency check for the numerical solutions that we find. Indeed, we verified experimentally that this condition generally holds true on the learned parameters. Before going on to the description of the algorithm in terms of a time discrete version of (11) which is machine implementable, we notice that we can softly enforce the positivity of the parameters by adding to the functional F the term e P in its domain can be tought as the concatenation of all the parameters sampled at the time grid defined above. The discrete counterpart of (11) is a classical gradient descent method which starts from the value x 0 (whose components are the sampling of u 0 on the various t j ) and compute the vector sequence x 1 , x 2 , . . . according to the update rule where π > 0 is the learning rate, and where now ∇f is the ordinary gradient in R 18(N +1) which can be computed at each step once we choose a numerical solver for Eq. (2) . After that the SIDARTHE equations are numerically integrated, the non-local term ϕ becomes simply a function of the variable x ∈ R 18(N +1) . We found that the update rule (12) that defines the gradient flow suffers a normalization problem which affects the parameters at different time. Basically, since the term ϕ in Eq. (7) depends on the variables x though a numerical integration of (2), changes in early (in time) parameters result in greater variations of ϕ than changes in later (in time) parameters, since the latter will only affect the solution of the SIDARTHE in the last part of the interval [0, T ], whereas the former parameters contribute to modify the potential ϕ on most of the interval. This suggest that the components of ∇f that corresponds to t i close to T are negligible with respect to the same quantity evaluated at earlier times. This makes the learning process either extremely unstable or exceedingly slow. In order to overcome this problem we modified the update rule (12) by introducing a regularization that is conceived for propagating the gradients from earlier to later times: wherex i ∈ R 18 for i = 0, . . . , N are the slices of x that correspond to the value of the parameters at time t i . The functionf is defined accordingly (for further details see Appendix A). We choose where π 0 , a and b positive parameters. Note that the above scheme makes sense when we pass to the continuous limit with respect to the index i. For this reason the following proposition is of interest Proof. It is sufficient to notice that, since we look for a solution which is continuous in t in the limit ∆t → 0 we must have, for each k ≥ 0, |x k i−1 −x k i | → 0. Then Eq. (13) in the continuous limit becomes which is exactly what we wanted to prove. This proposition shows that the update scheme defined in Eq. (13) is basically equivalent to introducing an increasing, time-dependent learning rate. Notice that, the term proportional to ω in Eq. (13) is reminiscent of the classic momentum term [21] . However, it also involves relations in the temporal domain which are neglected in other frameworks. Due to this similarity, in what follows, we refer to this term as the temporal momentum. Function π(t)/(1 − ω(t)) is, with appropriate choices of the parameters a and b, a monotonically increasing function for t > 0 and in particular for large t we have π(t)/(1 − ω(t)) π 0 e bt /at. The analysis of our learning framework is carried out on the Italian 5 and French 6 epidemiological data, gathered from official daily reports up to September 30, 2020. This section is divided in two parts. First, we discuss the results of the ablation study, confirming the importance of both the regularization term (Eq. (7)) and the update rule (Eq. (13)) for the learning process. Then, we fit SIDARTHE on the Italian and French data. The code to reproduce all the experiments is available online 7 . The differential equations were solved by Heun's method [22] , implemented in PyTorch [23] . The automatic differentiation in PyTorch computes the gradient ∇f for each time-variant parameter. The learning of the SIDARTHE rates is performed via the update rule in Eq. (13) , under the constraint on the first order derivativeu(t) introduced in Eq. (7) . The impact of these two components (update rule and first order constraint) on the learning process depends on the values of the hyper-parameters {a, b} in Eq. (14) and m in Eq. (7), respectively. The aim of this section is to discuss and quantify the role of these hyper-parameters. The Italian data set alone is considered in this section. Each experiment is repeated 20 times, provided with a random initialization x 0 of the model parameters. Unless specified differently, the training data set counts 120 consecutive data points and the subsequent 20 samples are used for test. Additionally, we trained other 20 models where temporal momentum (henceforth, momentum) was disabled (i.e. ω i = 0 for all i, thus reducing to a standard gradient descent). In total, we trained 5 × 6 × 20 + 20 = 620 different models. The results are presented in Figs. 2a and 2b . The plots clearly show that the momentum term improves the stability of the learning process. In particular, we see that the improvement saturates for b > 0.1. Conversely, a > 0 deteriorates the performances. Since the y axis is plotted with logarithmic scale, the confidence intervals are even narrower for b > 0.1 and a = 0. We then performed a second grid search on the hyperparameter space of {b, T }, where the values of b are the same as described above, and we considered 5 equally spaced values of T in [40, 120] . In this case too, for each value of the pair {b, T }, 20 models were trained. In addition, for each value of T , we trained additional 20 models with momentum disabled. In this setting, a total of 5 × 6 × 20 + 5 × 20 = 700 models were trained. Results are presented in Fig. 2c . The plot shows that when the momentum term is disabled, the model performs poorly on test, and the learning has wider confidence intervals. Instead, when momentum is enabled with a high enough value for b, the test loss becomes lower and the confidence intervals significantly narrow down. These experiments show that the momentum term dramatically improves the learning process, by further minimizing the (test) loss function and also reducing the dependency on the initial value u 0 . b) Regularization: to evaluate the effectiveness of the derivative term, we performed a grid search on the hyperparameter space of m ∈ {0, 1., 10 3 , 10 5 , 10 8 , 10 11 , 10 13 }. For each value of m we trained 20 models, for a total of 7 × 20 = 140 trained models. We plot the test loss as function of the weight, as shown in Fig. 3a and Fig. 3b . The results show that, except for m = 10 11 , the derivative term is not significantly changing the test loss. Instead, we see that the norm of the derivative of the parameters steadily decreases for m > 10 5 . This means that the derivative term contributes to enforcing parameters as smoother functions of time, without significantly degrading the generalization of the learning. We forecast the epidemic spreading in Italy and France. We trained our models in the time span going from February, the 24th, to August, the 30th, i.e. overall 188 days. The following 31 days were used for validation and test. In particular, we considered the period August, the 31st up to September, the 6th, for validation (7 days) and September, the 7th, up to September, the 30th, for test (25 days). The fitting was performed on the time series appearing in the functional risk F in Eq. (7) . i.e.D,R,T ,Ĥ,Ê. These values are all available from the Italian reports, whereas in the French official data, onlyR,T ,Ê are explicitly observed, along with the cumulative number of infectious and the number of hospitalized individuals that recovered, defined here as C I (t) and H h (t), respectively. Instead, hospitalized infected individuals corresponding to D (i.e. the proxy of the asymptomatic detected people) are not traced. Consequently, we did not have direct information about their number and recovery date. To extractD andĤ we made the assumption that asymptomatic individuals heal after a period d, that was set to 14 days, i.e. the quarantine period commonly established by national governments. In such a way, active asymptomatic infectiousD and recovered individualsĤ were estimated (at time t) as follows: Moreover, some daily French reports have partial or total missing information, causing the presence of many missing data. Due to the rich presence of noise and missing data in the early stages of French outbreak, the model fitting begins from March, the 17th instead of February, the 24th, while validation and test dates were left unchanged. The remaining missing targets within training/validation/test periods were simply ignored for learning and evaluation. In addition to restricting the space of the solutions to the problem in Eq.(8) by means of the first order constraint (previously discussed), we furthermore decided to reduce the number of learnable parameters. The learning of some pairs of parameters was tied together. In particular we tied β and δ, ξ and κ, λ and ρ, η and ζ. As initial conditions of the dynamical system we used the following values: where N is the size of the population considered. We found that starting from a good initialization of the parameters u(t) facilitates the learning and leads to better results. In the Italian case, we initialized all the parameters with the values provided in [4] , whereas for the French data set we initialised u(t) as a constant (not time dependent) such that R 0 = 1.95. We performed model selection based on the best solution in the validation period. The best models were obtained through grid search in the space of the hyper-parameters. In particular the positive constants e T , e R , e D , e H , e E that weigh the terms T , R, D, H d , E in the functional risk F , the coefficient m that acts on the derivative term |u| 2 /2, the factor e p that enforces the positivity of the solutions, the parameters a and b that define the ω function in Eq. (14) span the hyper-parameters space of the learning method. Based on the findings of the ablation study in Section IV-A, we can narrow the search in the hyper-parameters space by setting a = 0, b ∈ [0.05, 0.125] and m ∈ [10 5 , 10 11 ]. The learning rate π 0 , was set to 10 −5 . In order to provide a better understanding of the predictive capabilities of our model, we also report in Table I the Mean Absolute Percentage Error (MAPE) and the fraction of test days where the model predictions are beyond a certain tolerance error threshold, that we call d. We conclude this section with some remarks specific to each data set. a) Italy: the epidemic spreading in Italy is showed in Fig. 4 . It turns out that the model predictions are quite accurate over windows of a few weeks. The MAPE is on average always under 20% for each state variable, moreover it remains below the tolerance threshold of 30% in the test with the exception of the last 5 days of D (see Table I ). The obtained basic reproduction number R 0 reflects consistently the epidemic spreading and its values are coherent with the results reported in [24] for single Italian regions. It is worth mentioning that in this paper R 0 refers to the system dynamics interpretation associated with SIDARTHE model, which might somewhat depart from other estimations. All the model parameters are presented in Figure 5 . Interestingly enough, recovery rates σ, ξ (tied with κ), ρ (tied with λ) associated to ICU patients, detected and undetected symptomatic individuals, respectively, steadily increase over time, suggesting that hospitals are more and more prepared and trained to face the complications linked with the virus. b) France: the French outbreak forecast is presented in Fig. 6 . Despite the significant presence of noisy and missing data, we observe that the model succeeds in forecasting the state variables T, H, E, always within the tolerance. Instead, the state variable R is clearly overestimated. We believe it is caused by two main reasons: first, the overestimation of D overflows to R, and second an abrupt change in the data distribution, since the growth of target dataD(t) is not reflected by a similar increase ofR(t). The trend of the model parameters (see Fig. 7 ) is similar to the one obtained for Italy. Recovery rates tend to grow, detection rates quickly increase and then stabilize, symptoms development decreases significantly. In this paper we have discussed the problem of learning time-variant coefficients in compartmental models, with special attention on SIDARTHE [4] , a recently introduced epidemiological model which offers a very rich description of the stages of an epidemic infection. The major contribution of the paper consists of extending the challenging features of SIDARTHE model to the case of time-variant parameters that are properly learned from examples. This is carried out within a functional formulation of learning which is based on a special interpretation of gradient-flow, which allowed us to obtain a reliable forecasting of most critical indicators of the outbreak severity (i.e. deaths, recoveries and hospitalized in ICU individuals) of the COVID-19 epidemic outbreak. A massive experimentation in Italy and France has shown promising results over large windows in the last few months. We are confident that the proposed enrichment of SIDARTHE model, which is one of top level models for COVID-19 prediction, might be useful for supporting critical policies to face the diffusion of the infection all around the world. Given a function u ∈ X and the temporal partition 0 ≡ t = t 0 < t 1 < · · · < t N ≡ T , in this appendix we show how to explicitly construct its discrete counterpart as an element of the domain of the function f and subsequently how to rearrange its components to precisely define the quantitiesx andf that are used in Eq. (13) . Let u i,j := u i (t j ) the components of the matrix U ∈ R 18×(N +1) whose rows are the sampling on the temporal partition t 0 , t 1 , . . . , t N of the coordinates of u. Instead of working with matrices we exploit the isomorphism between R 18×(N +1) and R 18(N +1) that maps U → vec(U ) := (u 1,0 , u 2,0 , . . . , u 18,0 , . . . , u 1,N , . . . , u 18,N ) . With this mapping we can transform the initial point u 0 of the flow defined by (11) into the initial point x 0 necessary to start the gradient descent in Eq. (12) 8 : x 0 = vec((u 0 i,j )) ∈ R 18(N +1) . The relation between x andx j for j = 0, . . . , N and between f andf instead naturally follows once we explicitly 8 We adopt the notation (a ij ) to denote the matrix whose ij-th element is a ij . Notice that with this definition (∇f ) i ∈ R 18 for all i = 1, . . . , N + 1, while (∇f ) j ∈ R for all j = 1, . . . , 18(N + 1). This being said all quantities used in Eq. (13) are precisely defined once we specify thatx is used as a shortcut for x 0 , . . . ,x N . We thank Stefano Merler (FBK) for insightful discussions. 9 . Professor Gori is primarily interested in machine learning with applications to pattern recognition, Web mining, game playing, and bioinformatics. He has recently published the monograph "Machine Learning: A constraint-based approach," (MK, 560 pp., 2018), which contains a unified view of his approach. His pioneering role in neural networks has been emerging especially from the recent interest in Graph Neural Networks, that he contributed to introduce in the seminal paper "Graph Neural Networks," IEEE-TNN, 2009.Professor Gori has been the chair of the Italian Chapter of the IEEE Computation Intelligence Society and the President of the Italian Association for Artificial Intelligence. He is a Fellow of IEEE, a Fellow of EurAI, and a Fellow of IAPR. He was one the first people involved in European project on Artificial Intelligence CLAIRE, and he is currently a Fellow of Machine Learning association ELLIS. He is in the scientific committee of ICAR-CNR and is the President of the Scientific Committee of FBK-ICT. He has been recently invited by the Agence Nationale de la Recherche of France to be a member of the pool of experts for the French national research plan on AI. Dr. Gori is currently holding the 3IA Chair position at the Université Cote d'Azur. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak The challenges of modeling and forecasting the spread of covid-19 Modelling the covid-19 epidemic and implementation of population-wide interventions in italy Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Containing papers of a mathematical and physical character The mathematics of infectious diseases Mathematical Biology: I. An Introduction Data-based analysis, modelling and forecasting of the covid-19 outbreak A conceptual model for the coronavirus disease 2019 (covid-19) outbreak in wuhan, china with individual reaction and governmental action Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand Stochastic epidemic models: a survey Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions Linear Optimal Control Systems Optimal Control of Systems Governed by Partial Differential Equations, ser. Grundlehren der mathematischen Wissenschaften Adaptive finite element methods for optimal control of partial differential equations: Basic concept Gradient flows: in metric spaces and in the space of probability measures A primer of nonlinear analysis Parallel distributed processing ser. Texts in Applied Mathematics Automatic differentiation in pytorch The relationship between human mobility and viral transmissibility during the covid-19 epidemics in italy