key: cord-0992573-g4japnjk
authors: Carli, Raffaele; Cavone, Graziana; Epicoco, Nicola; Scarabaggio, Paolo; Dotoli, Mariagrazia
title: Model predictive control to mitigate the COVID-19 outbreak in a multi-region scenario()
date: 2020-10-01
journal: Annu Rev Control
DOI: 10.1016/j.arcontrol.2020.09.005
sha: 1e222abe270a949acb795d73086a2164f99001db
doc_id: 992573
cord_uid: g4japnjk

The COVID-19 outbreak is deeply influencing the global social and economic framework, due to restrictive measures adopted worldwide by governments to counteract the pandemic contagion. In multi-region areas such as Italy, where the contagion peak has been reached, it is crucial to find targeted and coordinated optimal exit and restarting strategies on a regional basis to effectively cope with possible onset of further epidemic waves, while efficiently returning the economic activities to their standard level of intensity. Differently from the related literature, where modeling and controlling the pandemic contagion is typically addressed on a national basis, this paper proposes an optimal control approach that supports governments in defining the most effective strategies to be adopted during post-lockdown mitigation phases in a multi-region scenario. Based on the joint use of a non-linear Model Predictive Control scheme and a modified Susceptible-Infected-Recovered (SIR)-based epidemiological model, the approach is aimed at minimizing the cost of the so-called non-pharmaceutical interventions (that is, mitigation strategies), while ensuring that the capacity of the network of regional healthcare systems is not violated. In addition, the proposed approach supports policy makers in taking targeted intervention decisions on different regions by an integrated and structured model, thus both respecting the specific regional health systems characteristics and improving the system-wide performance by avoiding uncoordinated actions of the regions. The methodology is tested on the COVID-19 outbreak data related to the network of Italian regions, showing its effectiveness in properly supporting the definition of effective regional strategies for managing the COVID-19 diffusion.

On December 31, 2019, the Wuhan Municipal Health Commission (China) reported to the World Health Organization a cluster of pneumonia cases of unknown etiology in the city of Wuhan, in the Chinese province of Hubei. On January 9, 2020, the Chinese Center for Disease Control and Prevention reported that a new coronavirus 5 (SARS-CoV-2, later called COVID- 19) was identified as the cause of such respiratory diseases. On March 11, 2020 , the World Health Organization declared the COVID-19 viral disease a pandemic. Since then, the COVID-19 has affected the whole world, with about ten millions of confirmed cases and five hundred thousand of confirmed deaths up to August 2020 over more than two hundred countries, areas, or territories, 10 thus becoming one of the most relevant pandemics in the recent decades [1] . Like other Coronaviruses (e.g., SARS and MERS), the COVID-19 appears to be controllable using basic Non-Pharmaceutical Interventions (NPIs), particularly social-distancing and the use of face-masks in public (especially when implemented in combinations). The factors that are obviously critically-important to the success of the anti-COVID-19 con- 15 trol efforts are the early implementation (and enhancement of effectiveness) of these measures, and ensuring their high adherence/coverage in the community [2] . However, despite the adoption of these measures, it is still possible that secondary waves of contagion occur. For instance, in China, restrictions were eased as cases declined, but by mid May, 2020, new clusters were reported, including in the city of Wuhan where the 20 virus first emerged. In effect, although the adopted countermeasures appear to have reduced the number of reported cases, the absence of herd immunity against COVID-19 suggests that contagions could easily rise again when these interventions are relaxed, as business, factory operations, and schools resume [3] .

As a consequence, in recent months there has been a growing research interest 25 on COVID-19 mitigation in different scientific fields. One of the most investigated research area aims at developing dynamical models to predict the evolution of the pandemic [4] . In effect, predicting the trend of the epidemic is of paramount importance to plan effective control strategies [5] to limit or block the spread of the epidemic. Broadly speaking, since a vaccine is not yet available, two main control strategies can be applied 30 [6], [7] : (1) mitigation, consisting in slowing but not necessarily stopping the epidemic spread (e.g., through isolation of suspect cases and social distancing of the elderly), so that the peak healthcare demand is reduced and individuals that are most at risk of severe disease from infection can be protected, and (2) suppression, which is aimed at reversing the epidemic growth, so that the number of cases is reduced, and kept low. 35 Clearly, each strategy has its own advantages and drawbacks, and the choice among the possible actions mainly relies on economic and social reasons, which slightly differ from country to country [6] . For instance, most countries have attempted to control the effect of COVID-19 by adopting a total lockdown of their population at a relevant economic cost, while few other countries have preferred timed interventions aimed at 40 reducing the number of infected people to a manageable level, depending on the capacity of the healthcare system to absorb and treat the newly infected [8] . In this regard, it is evident that the proper selection of which strategic action (or which combination of them) should be adopted to ensure the best outcomes is a challenging task.

In the pertinent literature, several models have been developed to describe the pan- 45 demic dynamics, which are based on the classic compartmental epidemiological models [9] , [10] , and adapting them to the specific case of COVID-19. Briefly said, epidemiological models, describing disease transmission within a population, provide important insights to understand which control mechanisms can lead, under what circumstances, to remove, or at least reduce, the infection [11] . More in detail, according to 50 the work in [9] , compartmental epidemiological models are typically classified on the basis of the considered compartments of individuals and the related flow patterns. In particular, labels such as M (i.e., infants with passive immunity), S (that is, the class of Susceptible people, i.e., those who can become infected), E (the class of Exposed, i.e., those who are infected but not yet infectious), I (the class of Infective individuals,i.e., 55 those who are capable of transmitting the infection), and R (the Recovered class, i.e., those with permanent infection-acquired immunity) are often used for epidemiological classes, and the threshold for most epidemiological models is the basic reproduction number R 0 , defined as the average number of secondary infections produced when one infected individual is introduced into a population of Susceptible individuals [9] . 60 Depending on the specific features of the disease to be modeled, some of the above compartments can be omitted (as an example, as shown in [12], the Exposed compartment is generally used only when the disease has a significant latent period relative to the infectious period), as well as further compartments can be identified and represented, nevertheless three of the recalled compartments should be always included in detected infected with life-threatening symptoms), H (Healed, i.e., recovered), and E (Extinct, i.e., dead). The final goal of the contribution in [5] is to estimate the impact of different actions to contain the contagion in Italy. To this aim, the authors evaluate different possible scenarios by suitably modifying some model parameters. 90 The recalled works focus on the analysis of the COVID-19 in a country at a national level; however, given the heterogeneity of economic and social features at regional level in almost any country, and particularly in Italy, it is actually essential to assess the evolution of the pandemic when applying suitable local post-lockdown strategies (that is, once the epidemic is brought under control, or in the so-called Phase 2). In 95 effect, many countries are divided into administrative regions which can independently oversee their own share of national healthcare system [16] . Nonetheless, only few contributions in the related literature take into account the spatial dynamics of the epidemic, among which we recall the work in [14] , where a spatial SEIR model for the ongoing COVID-19 emergency in Italy is developed as a baseline support tool to plan 100 the inter-regional mobility and to deploy medical supplies and staff based on the local epidemiological conditions. Similarly, in [17] a SEIIR model (i.e., including classes of Susceptible, Exposed, pre-symptomatic and symptomatic Infectious, and Removed individuals) is adopted to evaluate the impact of school closure and telework during the COVID-19 lockdown in the three most affected France regions (i.e., le-de-France, 105 Hauts-de-France, Grand Est) through a stochastic age-structured data-driven analysis. A generalized multi-region SIR model is proposed in [18] , as well as a multi-region SI 2 R 2 extension (where the considered classes are Susceptible, Infectious but not yet diagnosed, Infectious diagnosed, Removed undiagnosed, and Removed diagnosed individuals), albeit only the mathematical formulation of the models is reported while 110 the validation is not yet available. Finally, in [16] a model including the Susceptible, Infected, Quarantined, Hospitalized, Recovered, and Deceased compartments for the Italian regions is presented to evaluate the effectiveness and impact level of differentiated but coordinated local actions during Phase 2, to avoid future recurrence of the epidemic, while taking into account the specific regional healthcare systems' charac-115 teristics. The main finding of these studies is that the expected impact of mitigation measures deeply varies across different regions. Therefore, it is essential to correctly parameterize the model to the specific characteristics of each region under examination.

All the above recalled approaches, while being able to model the pandemic dynamics, do not provide a feedback control method to properly select the most beneficial 120 action(s) (for instance, based on the number of infected or of hospitalized patients) to be applied in a post-lockdown framework. In fact, these methods can be classified as open-loop techniques that typically require what-if analyses (through scenario-based evaluations or Monte Carlo simulations) to identify the most effective actions. In this perspective, it is fundamental to provide a tool for a feedback-based selection of the 125 mitigation strategy that continuously adapts to the contagion evolution. This is possible by constantly measuring and monitoring the pandemics values and adapting the policy accordingly [19] . It has been proven that an open-loop optimal control policy is successful to evaluate simple policies under the assumption of exact model knowledge, while in a more realistic scenario with uncertain data and model mismatch, a feedback 130 strategy that periodically updates the policy is much more effective [19] . In effect, the use of feedback control theory represents a powerful tool to support managing the COVID-19 outbreak [7] . Unfortunately, most of the existing literature contributions on the control of previous epidemics involves vaccines or treatments, which are currently not available for COVID-19.

In addition, in the recalled contributions, the lockdown periods are typically driven by a periodic switching logic. On the contrary, in order to ensure an active and effective support to the decision-making process, it is essential to tune the parameters of the mitigation strategy over larger time periods, providing a robust outer supervisory feedback loop to the process [8] . To this aim, the contribution in [8] proposes a fast switching 140 policy, consisting in multi-shot interventions based on the outcomes of two SIR-based models (i.e., the SIQR and the SIDARTHE) to switch between quarantine (i.e., social isolation) and work days (that is, normal behavior).

Given the obvious uncertainties in the spreading of the virus and in the disease progression, an effective feedback control strategy can be obtained by joining a SIR-based 145 epidemic model with other techniques, thus providing a hybrid model. In fact, the challenge is to determine the optimal external input across time so that a target can be reached (e.g., by optimizing a cost function) [20] . Hence, the final aim is to combine a disease transmission model with a feedback control mechanism of the epidemics, which allows controlling the whole network rather than only predicting the recovery 150 time or the proportion of infected individuals. Indeed, optimal control theory has been already successfully applied to identify the best action strategies for other diseases, typically by simply introducing into the predictive model a new control variable representing the vaccination rate at time t (see, e.g., the work in [21] , where a SIR model for dengue is proposed, and that in [22] , where a SIR model for Ebola is presented), 155 or other specifically defined control variables (such as, for instance, in [23], where an ad-hoc model for the optimal control of tuberculosis is suggested by including in the model reinfection and post-exposure interventions).

In this perspective, Model Predictive Control (MPC) is a control technique including both feedback control and optimization that allows to take into account the devia-160 tions of the predictive model from the real progress of the disease [24] , [25] . Although implementing the MPC controller typically requires a large amount of computational resources, which can lead to long computation time [26] , this is not a concern when the optimization is performed at a strategic level, as is the case of the decision-making process for the definition of the proper strategies to tackle epidemiological diseases.

The basic idea here is to keep the true system state (that is, the predicted future output of the model) in line with the selected target. This is achieved based on past control inputs and the optimal predicted control input over a prediction horizon by solving an optimization problem aimed at minimizing a cost function [27] . Basically, at regular time intervals, the values of the state variables in the prediction model are updated, 170 hence the control is re-optimized and the new strategy is applied to the system until the next update time, thus ensuring that the approximate model and the control strategy closely match at each time interval [24] . The main strength of this procedure is that it allows to directly take into account the unavoidable uncertainties in the model [27] . In effect, MPC has been already applied, in conjunction with some SIR-based predictive model, in some previous studies on epidemiological models, showing that coupling feedback control with simulation models can help designing effective and robust action strategies for managing epidemics [24] . For instance, the contribution in [20] investi-gates the dynamical control of a generic SIS (i.e., Susceptible-Infectious-Susceptible) epidemic model through a non-linear MPC method aimed at minimizing an objective 180 function (which includes the predicted future outputs of the model) over a finite predictive horizon (which is moved forward at each control step). The work in [20] shows that the MPC algorithm allows to guide the system to the desired target once the values of the control parameters are carefully chosen. A robust economic MPC for the containment of a generic stochastic SEIV (i.e., Susceptible-Exposed-Infected-Vigilant) 185 epidemic process is presented in [28] , with the final aim of deciding who to quarantine, and for how long, in the presence of an epidemic contagion. Furthermore, the work in [24] studies a generic stochastic SIR model to optimally allocate vaccination resources while minimizing the costs of an epidemics, showing that the use of MPC allows improving the disease management, reducing cost, and ensuring more robustness 190 to uncertainty, thus performing well on complex models.

In the COVID-19 framework, to the best of the authors' knowledge, the joint use of predictive epidemiological models and MPC has been studied in just very few papers. More in detail, the work in [27] proposes a methodology based on an extended SEIRmodel and MPC to determine the optimal government strategy to tackle the in Belgium under dynamic circumstances. First, the model is calibrated by means of the available data over time on the number of active cases and that of deaths. Then, the MPC is used to optimize the time course of social measures with respect to the available Intensive Care Units under three different scenarios: no government action; the current policy (i.e., with mild social restrictions); on-off strategy and immunization 200 of the herd. Similarly, the contribution in [19] combines a SIDARTHE model and a robust MPC approach that adapts the social distancing measures to minimize the number of fatalities over a range of two years when measurements are inaccurate and infection rates cannot be precisely evaluated. First, the model parameters and the initial conditions are calibrated according to the COVID-19 outbreak in Germany. Then, the 205 outcomes obtained through a closed-loop control via MPC are compared with those obtained when a multi-objective open-loop optimal control is performed, showing that the latter approach may lead to intermediate increases in the number of new infections, thus requiring an additional lockdown period, while the former method is able to avoid such a behavior, thus significantly reducing the number of fatalities.

Against this background, this paper presents a multi-region SIRQTHE model in conjunction with a non-linear MPC approach which is able to simultaneously take into account both the specific strategies undertaken at the regional level (i.e., restrictions on the intra-region activity) and the actions taken at the upper level in terms of border activities between the regions (i.e., restrictions on the inter-region activity). We remark 215 that, in this paper, we refer to the control actions to be undertaken in order to mitigate the effect of secondary waves of pandemic, that is, we are assuming that the basic non-pharmaceutical interventions (such as, for instance, social-distancing and the use of face-masks) have been ineffective or measures have been relaxed, and as such some more restrictive (but still non-pharmaceutical) measures, such as restrictions on the 220 intra-region and on inter-region activities, are needed.

The analysis is conducted in Italy, since this is one of the countries where the pandemic effect has been the most significant, particularly in some regions (such as in Lombardy), as well as diversified in the territory. In addition, the healthcare system is a regionally based national health service (known as Servizio Sanitario Nazionale), in 225 which public healthcare facilities strongly vary in terms of quality depending on the region. The presented approach can be easily applied to different levels of the spatial scale, provided the availability of data to calibrate the model.

With respect to the existing state of the art, the novel contributions of this paper are as follows:

• We define a networked SIR-based model (a SIRQTHE model) to represent the spread of COVID-19 in multi-region areas where the economic and healthcare systems are characterized by strong regional heterogeneity. More in detail, we extend the model proposed by [18] by defining a more accurate single-region epidemic model, where seven compartments are introduced (namely, Susceptible,

Infected, Removed, Quarantined, Threatened, Healed, and Extinct). We remark that, differently from the six-compartment model presented in [16] , in this work we split the compartment of recovered individual into two classes: the class of Healed people (i.e., recognized individuals that heal after transition in the status of Quarantined or Threatened), and that of Removed people (i.e., completely 240 recovered people that have never been detected). However, it is important to remark that, even employing more classes than the model in [16] , our approach has fewer connections between the classes and hence fewer parameters are to be estimated, to the benefit of the model simplicity. Moreover, while in [16] the model parameters are dynamically identified by splitting the fitting period, in our 245 work we identify time-varying functions to model time-varying parameters, such as the recovery and the death rate. Finally, another improvement with respect to the model presented in [16] is that we leverage on the Google mobility reports to better take into account the time-dependency of the infection rate.

• While in the related literature modeling and controlling the pandemic contagion 250 is typically addressed on a national basis, this paper proposes an optimal control approach that supports governments in defining the most effective strategies to be adopted during post-lockdown mitigation phases in a diversified multi-region scenario. The proposed approach allows policy makers taking targeted intervention decisions on different regions by an integrated and structured model, 255 thus both respecting the specific regional healthcare systems characteristics and improving the system-wide performance by the avoidance of uncoordinated behavior by individual regions.

• Differently from the related literature, where the addressed control strategies aim at minimizing the number of fatalities or ensuring that the healthcare systems is 260 not overloaded, this paper also focuses on the economic impact of the control strategies. Indeed, the approach aims at minimizing the cost of the mitigation strategies on a multi-region area, while ensuring that the capacity of the network of regional healthcare systems is not violated.

• By applying the proposed methodology to the network of Italian regions, we 265 show its effectiveness and flexibility in properly supporting the definition of effective regional strategies for managing the COVID-19 disease under different government policies. We discuss the results achieved by the MPC scheme when control actions within and on the border of regions are applied both by a regionby-region basis or by an inter-region coordination mechanism. In particular, we 270 analyze and compare the following scenarios: uniform intra-region activity and inter-region travel restrictions, differentiated intra-region activity restrictions and uniform inter-region travel restrictions, and differentiated intra-region activity and inter-region travel restrictions.

The rest of this work is structured as follows. In Section 2, we first present the 275 single-region SIRQTHE model, providing the formulation of the dynamic equations and the description of the parameters' identification process; subsequently we focus on the extension of the SIRQTHE model to a multi-region framework. In Section 3, we present the multi-region MPC framework, describing the control variables, the corresponding constraints, and the control objectives, and formulating the optimal control 280 problem. In Section 4, we show the numerical results of the simulations of the Italian country based on real data ( [29] ) and we provide a comparison with respect to the results obtained by benchmark control strategies. Finally, in Section 5, we conclude the paper and discuss an outlook for future works. 

Currently, governments all around the world are struggling to contain the COVID-19 pandemic. In this context, mathematical models are extremely valuable to simulate and control the spread of the pandemic. In fact, mathematical models are widely used to estimate the contagion parameters and predict the effects of any control action on 295 populations.

Compartmental models are traditionally considered suitable to model the spread of a virus within a large population [9] . In these models, the overall population is divided into different compartments, where people can flow from one compartment to another based on specific rate values. In particular, in the SIR-based models, we can assume that the dynamics of the pandemic is quicker than the dynamics of birth and death, therefore, the latter two events are usually omitted. Hence, the simplest SIR model can be expressed by the following set of ordinary differential equations [9] :

where N denotes the overall population, S, I, and R represent respectively the compartments of the Susceptible, Infected, and Removed (dead or recovered) individuals , β ∈ R + is the infection rate, and γ ∈ R + is the recovery rate. Since N corresponds to 300 the sum of compartments populations, it holds:

Consequently, in (1)-(3) it is sufficient to analyze only two equations since the third is dependent. Finally, it is to be noticed that, as an alternative to (1)-(3), the discrete-time version of the SIR can be straightforwardly formulated (since a comparison between the two 305 variants is out of the scope of this contribution, we refer the interested reader to the work in [30] for details). Independently from the choice of the time domain, the SIR model is able to characterize in broad terms the spread of a pandemic, thus disregarding the multiple and complex facets of a real pandemic. Consequently, several models have been proposed in the related literature to improve the classical SIR model for a specific 310 pandemic, for instance, by adding additional compartments and by better modeling the relation among them.

The proposed epidemiological model for the COVID-19 is a novel time-varying discrete-time model, named SIRQTHE, which distinguishes between detected and un-315 detected infected people, healed, dead, and hospitalized. As a major assumption, we suppose that the probability of becoming susceptible after being healed is negligible, which seems reasonable based on the current level of knowledge [31] .

More in detail, in our model, the overall population in a given region is divided into the following compartments: The overall interconnections between the above compartments are shown in Fig.1 . The SIRQTHE model is composed by seven time-varying difference equations, which characterize the flows of the individuals between the different compartments. More in detail, by designating all the state variables (the fraction of the overall population) by a Latin letter, and denoting the time step as k, the model is described as follows:

R(k + 1) =R(k) + γĨ(k) (7)

T (k + 1) = T (k) + µ Q(k) + λ I(k) − (π(k) + ε(k)) T (k)

where the state variables indicated with a tilde are those that cannot be directly observed with a reasonable confidence, i.e., there are no data from official sources [13] . Let us describe in detail the model parameters and the assumptions underlying the 330 model conceptualization. As shown in Fig. 1 , the seven classes are related by different parameters, which are able to capture the system dynamics. In particular, β(k) is the time-varying infection rate, whose value is strongly dependent on the population behavior and the social distancing measures. In Appendix A, we show that this parameter may be highly correlated with people's mobility. Parameter θ is the rate of 335 infected that are recognized and Quarantined, γ is the rate of healing when the Infected and unrecognized people do not need to be hospitalized, δ is the rate of healing of Quarantined people, λ is the rate of people that have been recognized only when a strong symptomatic condition occurs and therefore an immediate hospitalization is needed. Moreover, µ is the rate of Quarantined people that need to be hospitalized, π is the rate 340 of healing of the Threatened people and ε is the death rate.

The justification of this model construction lies in achieving a good compromise between the model accuracy, which allows representing all the facets of the pandemic diffusion, and the simplicity that helps identifying the characteristic parameters from the available data. As shown in Section 1, several papers on the spread of COVID- 345 19 pandemic indeed present compartmental models with more classes; however, these works mainly lack of an accurate identification of the model parameters. In several countries, the available data are scarce and are divided into few categories: this makes a more complex model hardly implementable in real life. As a consequence, some simplifying assumptions must be indeed made for the sake of preserving the model 350 practicality. In reference to the SIRQTHE model, we compress or eliminate some classes that are usually employed in more complex models. For instance, we consider simply a Quarantined people class not discerning from those asymptomatic and those with mild or strong symptoms. Moreover, we ignore that hospitalized individuals may in fact require different treatments; in fact, several models divide this class considering 355 an additional class that takes into account people needing intensive care treatments. In addition, we partially neglect the incubation time of the virus. Lastly, we do not use the so-called Exposed compartment, which contains the people that have been infected while they are not yet contagious. Despite these hypotheses, as shown in Appendix A, the proposed SIRQTHE model shows its effectiveness in the identification phase 360 based on a minimal set of measurable epidemiological data that are commonly available across countries.

In order to correctly represent the COVID-19 spread, a multi-region variant of the model in (5)-(11) is here proposed. In effect, a multi-region model is more reliable than a centralized model to reproduce the heterogeneous situation in multi-region areas with high-fidelity. In particular, as proposed in [18] , we generalize our SIRQTHE model to a multi-region case. By assuming the area under analysis is composed by M regions, whose index i varies in the set M = {1, . . . , M }, the equations related to the i-th region can be written as follows:

Comparing equations (5)-(11) with (12)- (18), we remark that in the latter formulation we mark the state variables related to region i with the corresponding index, and we 365 add a further term in the right-hand side of the first two difference equations to take the migration of individuals between regions into account. In particular, we use the timevarying coefficients ξ i,j (k), ∀i, j ∈ M to represent the inter-region mobility:

represents the rate of people leaving the region i at time k. We assume that all 370 the parameters ξ i,j (k) (∀j = i) get non-negative values; thus, ξ i,i (k) has a non-positive value equal to:

Note that (19) is derived by imposing the balance of the migrations flows between all the regions:

The complete model for a network composed by M regions can be written in matrix 375 form. First we define:

as the vectors containing all the state variables for each region. Then, we define the model parameters matrices as:

where all the parameter matrices are diagonal, except for matrix Ξ(k) of the migration coefficients between regions. Finally, the overall multi-region SIRQTHE model can be written as follows:

where the symbol • represents the operator of the component-wise product (i.e, the Hadamard product).

For the sake of clarity, Fig. 2 shows the schematic diagram of the multi-region SIRQTHE model, where the links between the different single-region SIRQTHE models highlight the migration fluxes in terms of exchanged Susceptible and Infected individuals.

Before introducing the multi-region MPC framework we preliminarily recall that the aim of this paper is to support decision makers in identifying the optimal control actions to mitigate the effect of secondary pandemic waves, when the basic NPIs actions (e.g., in terms of social-distancing and use of face-masks) are ineffective or such measures are relaxed, thus requiring some more restrictive measures. As an example, in China, restrictions were eased as cases declined, but by the mid of May 2020, new pandemic clusters were reported. In effect, although the adopted NPIs countermeasures reduced the number of reported cases, the absence of herd immunity against COVID-19 suggests that contagions could easily rise again when these interventions are relaxed, as business, factory operations, and schools resume. Therefore, as no vaccine is cur-400 rently available, we assume that any long term management of the COVID-19 spread should aim at reaching the heard immunity while not exceeding the regional healthcare capacity and limiting the loss of the regional economic systems. This goal can be reached by applying some interventions that are more restrictive than the basic NPIs [6] in accordance with an optimal control strategy, whose aim is to keep low the num-405 ber of fatalities while minimizing the effects on the economic framework. In addition, in multi-region areas with a strong regional heterogeneity, it is crucial to find targeted and coordinated optimal exit and restarting strategies on a regional basis to effectively cope with the possible onset of further epidemic waves, while efficiently returning economic activities to their standard level of intensity. As a consequence, in this section 410 we present an optimal control approach based on a receding horizon scheme, which supports regional governments in defining the most effective strategies to be adopted during post-lockdown mitigation phases in a multi-region scenario.

Throughout the rest of the paper, we consider the same length for both the prediction and control horizon. In particular, at the generic time instant h ∈ Z + the horizon

In this section, we introduce some different control actions that can be used to contain the impacts of the COVID-19 pandemic. On the one hand, we assume that the 420 parameters related to the healthcare system cannot be directly modified. On the other hand, we assume that any control action is focused only on reducing the parameters β i (k) and ξ i,j (k), i.e., the intra-region infection rate and the inter-region migration coefficient at each time k, respectively. These parameters are indeed controllable in some way. For instance, a reduction of the internal activities of region i would reduce significantly the corresponding β i (k), whilst travel restrictions between region i and region j will reduce coefficients ξ i,j (k). Therefore, we model two classes of control and mitigation interventions:

• intra-region activity restrictions;

• inter-region travel restrictions.

As for the first class, for each region i we preliminarily define a vector of control variables u i := u i (h : h + K − 1) that models the interventions on the activities performed within the given region (in terms of percentage reduction) over the given control horizon. In addition, we assume that the control action u i (k) related to the restriction of activities in region i at time k gets a finite set U i of discrete values. For 435 instance, when U i = {0, 0.2, 0.8}, u i (k) = 0.8 corresponds to a complete lockdown, u i (k) = 0 is the normal condition, and u i (k) = 0.2 corresponds to telework and closure of schools and universities in region i at time k. Subsequently, we assume that a reduction of the activity level linearly produces a decrease of the infection coefficient for all the regions [6] . Hence, we correlate the intra-region infection rate for time slot 440 k with the intra-region activity restriction measures in accordance with the following linear equation:

where β 0 i is the infection rate when no measures are applied. We model the second class of control actions similarly to the first class. For each region i we preliminarily define a vector of control variables r i := r i (h : h + K − 1) 445 that models the restrictions on the mobility from and towards the given region over the given control horizon. Moreover, we assume that the control action r i (k) related to the restriction of mobility from and towards region i can get a finite set R i of discrete values. For instance, in the case of on/off strategy, R i = {0, 1}: r i (k) = 1 corresponds to the situation where inbound and outbound mobility is forbidden, whilst r i (k) = 0 means that no inter-region travel restriction are imposed to region i at time k. Subsequently, we assume that a reduction of the inter-region mobility produces a linear decrease in the Susceptible and Infectedin region i (see [18] ) in accordance with the following linear equation:

where ξ 0 i,j denotes the coefficient of migration from the region j to the region i when 455 no mobility restrictions are applied. Finally, we denote as u := (u 1 , . . . , u M ) and r := (r 1 , . . . , r M ) the vectors collecting the intra-region activity and inter-region restriction strategies over all the regions in M, respectively. We assume that the intra-region activity and inter-region restrictions can be either applied on a region-by-region basis or by an inter-region co-460 ordination mechanism. Such a kind of policies can be reflected in the control system by properly defining constraint sets U and R on the given decision variables:

For instance, the control actions could be applied to the whole network of regions in accordance with the following policies:

• Uniform intra-region activity and inter-region travel restrictions. This case is 465 applicable to a multi-region structure controlled by an upper-level government that does not allow each individual region to implement differentiated control actions. Under such a policy, the definition of the constraint sets U and R is given by:

An example of application of this policy was implemented by the Italian gov-470 ernment during the so-called COVID-19 Phase 1: the lockdown and the boundary closure was simultaneously imposed to each Italian region, i.e.:

• Differentiated intra-region activity restrictions and uniform inter-region travel restrictions. This case is applicable to a multi-region structure where the regional 475 jurisdiction allows implementing individual control actions on internal activities without having an effect on the status of the regional boundaries. Under such a policy, the definition of the constraint sets U and R is characterized by coupling between the control variables related to different regions:

An example of applying this policy was implemented by the Italian govern-480 ment during the so-called Phase 2: all the regional boundaries were kept closed (r i (k) = 0, ∀i ∈ M), while each region determined the restarting strategies on a local basis.

• Differentiated intra-region activity and inter-region travel restrictions. This is the most general case of a multi-region structure where the regional jurisdiction 485 allows implementing individual control actions both on regional internal activities and boundaries. Under such a policy, in the definition of the constraint sets U and R there is no coupling between the control variables related to different regions: We finally remark that, to avoid too frequent and unpractical changes in the strategies, the control actions can be kept constant over a given period equal to ∆l = ω ∆k (i.e., for ω time slots). For instance, if ∆k is one day, it could be meaningful to set the periodicity of the control actions to one week (i.e., ω = 7). Assuming that K = L ω, with L ∈ N, the following additional constraints on the control actions are then introduced:

In Fig. 3 we show the different time intervals used to update the state model and the control actions.

The proposed MPC approach aims at simultaneously optimizing the objectives of all the regions. Thus, the objective function of the overall online optimization problem 500 is formulated as the summation of the single-region objective functions. In turn, the objective function related to region i -denoted as J i (∀i ∈ M) -is composed by multiple cost terms as follows:

where vectors S i ,

collect the predicted values of compartment members in region i over the control horizon (e.g., S i = S i (k : k + K − 1)). Note that 505 coefficients C T , C u i , and C r i have a twofold function: on the one hand, they provide a prioritization among the multiple cost terms in (37) ; on the other hand, they ensure that these terms are homogeneous (namely, they make the three terms dimensionless).

More in detail, the first term in (37) -weighted by coefficient C T -represents the cost incurred by the healthcare system of region i, which is consequent to the predicted 510 epidemic evolution over the whole control horizon. By properly assigning a very large value to C T , we ensure that the number of Threatened individuals T i (k) related to region i is lower than a prefixed maximum T max i for each time slot k. The second term in (37) -weighted by coefficient C u i -is the cost incurred by the regional economic system, as a result of applying the intra-region activity restriction 515 in region i over the entire control horizon. It assumes that there is a linear correlation between the level of restriction imposed to the internal activities and the loss of the regional economic productivity.

The third term in (37) -weighted by coefficient C r i -represents the cost incurred by the regional economic system, as a result of regulating the closure of boundaries 520 to region i over the entire control horizon. It assumes that there is a linear correlation between the level of restriction applied to the in-and out-bound mobility and the loss of the regional economic productivity, according to (30) .

Summing up, the objective function defined in (37) allows finding a control policy that minimizes the economic loss -quantified through the last two terms of (37)-and 525 simultaneously keeps the number of Threatened under a safety threshold -thanks to the presence of the first term of (37) .

Finally, note that weighting coefficients C u i and C r i are region-dependent: this ensures that regional policy makers can adjust these coefficient in accordance with the importance and priority that can be assigned to the above mentioned costs in each re-530 gion depending on local scenarios, according to (31) .

Having defined the state model, the control variables with the corresponding constraint set, and the objective function related to the online optimization, the optimal control problem is formulated as follows:

constraints on control variables (32) and (36) . (38) The optimization problem (38) has 2M K integer and 7M K real decision variables (i.e., u, r and S i ,

respectively); furthermore, it presents non-linearities both in the objective function and state model. Consequently, the optimization problem (38) is a mixed-integer non-linear programming (MINLP) problem.

Due to (36) , the optimization problem (38) is iteratively solved every ω time slots in accordance with the receding horizon paradigm (see Fig. 3 ), based on the most recent input data. Only the results referring to the first ω time slots are applied to the system as the optimal control signals, whilst the horizon is shifted ahead. Then, for the next group of ω time slots, a new optimization problem is solved using the updated information 545 on forecasts and system states. This results in the closed-loop control scheme shown in Fig. 4 . Note that the presented closed-loop feedback control technique may rely both on directly measurable and not directly measurable quantities. In fact, since not all SIRQTHE classes may be directly estimable, at each time shift a procedure for the identification of 550 the model parameters should be conducted. More in detail, at each time shift, the latest data related to the available classes should be used to update the remaining SIRQTHE parameters by employing the dynamical identification procedure (e.g., referring to the Italian scenario, see the identification description defined in Appendix A).

The MPC integrates both the control actions and multi-region epidemic models 555 (described in the previous sections), taking into account the mutual interaction between the effects of the intra-region activity and the inter-region mobility restrictions and the multi-region epidemic dynamics. The MPC law is defined in accordance with an output-feedback formulation. The optimization problem aims at determining the control actions (e.g., intra-region activity restrictions and the inter-region mobility re-560 strictions) for each region, whilst the measured responses coincide with the main epidemic parameters (e.g., number of hospitalized or Quarantined, level of mobility, etc.) monitored by the regional government agencies. Obviously, the estimation of all the variables influencing the epidemic dynamics in the network of regions (i.e., the variables that are not monitored by sensors), as well as the presence of disturbances, can 565 affect the accuracy of the model response. The application of the MPC strategy allows limiting the effects of such uncertainties thanks to the computation of feedback control actions that are based on periodical updates of the actual system state and on the prediction of its evolution in a rolling horizon approach.

In order to validate the optimal control approach of the COVID-19 outbreak, we test the proposed multi-region methodology on the Italian scenario. In particular, in this section we report and analyze the results of the optimal mitigation strategies on the network of M = 20 Italian regions, i.e., the second level administrative body in Italy. In effect, the Italian scenario suits well the proposed control approach due to two main 575 aspects. On the one hand, the Italian national healthcare system is regionally based. In fact, all the regions have a different level of quality for the healthcare facilities; in addition, each region has different regulations and policies regarding swabs, hospitalization, treatment, and prevention. On the other hand, the spread of COVID-19, and the consequently adopted containment measures, produced extremely heterogeneous 580 effects on the Italian regions: while the North of Italy, and particularly Lombardy, has been facing a tremendous amount of COVID-19 cases (with about one hundred thousand of confirmed cases and sixteen thousand of confirmed deaths as of August 13, 2020 [32] ), the South of Italy is experiencing a relatively stable situation.

First, based on the real data available for the Italian pandemic (see [29] ), we esti-585 mate the model parameters for both the single-region and the multi-region SIRQTHE model. Then, we discuss the long-term outcomes that can be achieved by applying the proposed optimal control approach for different restriction policies. We also provide a comparison with respect to the results obtained with three performance evaluation benchmarks, that is: (1) the minimization of the economic cost;

(2) the minimization 590 of the Threatened; (3) the threshold rule-based feedback control scheme inspired by the Italian government's existing protocol [33] .

This section reports the main results of the identification to be used as initial conditions for the considered control schemes. The detailed description of the fitting pro-595 cedure is reported in Appendix A.

The time slot ∆k is set to one day, whilst the simulation period is one year. Moreover, since a daily application of the restrictive measures would be unrealistic and impossible to be implemented in real-life, we assume that the control actions are implemented on a weekly basis (i.e., ∆l = ω ∆k, with ω = 7).

The initial number of Quarantined, Threatened, Recovered, and Extinct individuals are set to the values of August 13, 2020, according to the real data in [29] . Consequently, the corresponding number of Susceptible, Infected, and Removed individuals is estimated according to the identification procedure reported in Appendix A for the considered date. 605 We highlight that, since the aim of our work is to reduce the infection rate β(k) so as to contain the number of Threatened people under a well-defined maximum level T max i , we assume that β(k) is a function of the mobility level, computed on the basis of the daily Google mobility reports [34] . Note that T max i is specifically defined for each region, due to the high heterogeneity of the Italian healthcare system. Moreover, 610 we assume that parameters γ, δ, θ, λ, and µ are constant over the prediction horizon and are computed by means of the identification procedure described in Appendix A. Parameters π(k) and ε(k) are time-dependent; however, in the simulations, we set their value to the last fitted value. This hypothesis is reasonable because, although these two parameters are highly variable during the first stage of the pandemic, for long-time 615 observations they settle to a stable value, as shown in Appendix A. Finally, the timevarying migration coefficients ξ i,j (k) (∀i, j ∈ M) are adapted from [35] .

The optimal control problem (38) is implemented in the Matlab environment [36] using the Global Optimization toolbox on a laptop equipped with a 1.3 GHz Intel Core i5 CPU and 8 GB RAM. Since problem (38) falls into the class of MINLP problems, its resolution is non trivial due both to its combinatorial complexity and non-linearity. Therefore, a two-step genetic algorithm approach is here applied for the resolution of (38) . In the first phase, being n the number of control variables, we perform 1, 000 n parallel computations of the genetic algorithm with 1, 000 n generations, i.e., an initial population size of 1, 000 n. In the second phase, the outcomes of the first phase are 625 used as the initial population of an additional genetic optimization process.

As for the objective function of (38), the cost coefficients C u i and C r i are set for each region based on the corresponding per capita regional Gross Domestic Product (GDP) normalized by the per capita national GDP (see Appendix B). The coefficient C T is much bigger than C u i and C r i , i.e., it is considered equal to 10,000. In fact, we assume 630 that C T is much higher than the other coefficients because our objective is to keep the number of Threatened cases below a maximum level T max i , i.e., the first term represents a soft constraint. We do not impose such a condition by means of an explicit additional constraint because, with some settings for the model (e.g., a high infection rate), the problem may become infeasible. Note that, on the basis of the Italian data [32] , we 635 set T max i equal to three times the number of Intensive Care Units (ICUs), reported in Appendix B.

Finally, we assume that three different intra-region activity restrictions can be implemented by each region at each time k: lockdown (i.e., u i (k) = 0.8), partial lockdown corresponding to the closure of specific activities, such as universities and schools 

In this section we show the results obtained by the proposed MPC over the given simulation period of one year using a prediction horizon of eight weeks (i.e., K = 56, L = 8) with the application of the following restriction policies:

• U-U policy : Uniform intra-region activity restrictions and Uniform inter-region travel restrictions.

• D-U policy: Differentiated intra-region activity restrictions and Uniform interregion travel restrictions.

• D-D policy: Differentiated intra-region activity restrictions and Differentiated inter-region travel restrictions.

As a first outcome, we analyze all the considered restriction policies using several 655 sets of weights by changing the relative importance of intra-region activities with respect to inter-region mobility (that is, by varying parameter α in C r i = αC u i , ∀i ∈ M). Figures 5-7 show the results obtained for the three above policies when α = 1. The blue line represents the time evolution of the Threatened cases, the red line represents the time evolution of the intra-region control actions, and the green dotted line represents the time evolution of the inter-region travel restrictions, while the black line represents the maximum number of Threatened cases that can be treated by the i-th region (i.e., T max i ).

In particular, Fig. 5 highlights that, by adopting the U-U policy, the trend of the control actions is the same for all regions, while the time evolution of the Threatened 665 cases largely varies depending on the considered region. Note that such a policy seems to be unnecessary in some regions, such as Aosta and Trentino-South Tyrol that present a very small amount of Threatened cases with respect to the remaining regions. We highlight that with this policy only the intra-region control policy are applied. Finally, the overall optimal value of the objective function in problem (38) is equal to 462.85. 670 Figure 6 (D-U policy, with α = 1) shows that the trend of the control actions differs from region to region and the time evolution of the Threatened cases still largely varies depending on the considered region. It has to be noticed that, similarly to Policy U-U, Policy D-U imposes that the regional borders must be open during the observing period. On the contrary, differently from Policy U-U, the intra-region control actions 675 are set coherently to the specific pandemic evolution occurring in each region. In fact, in Aosta no unnecessary restrictions are applied on the intra-region activities. It is also important to remark that the overall optimal value of the objective function in problem (38) is equal to 191.31, that is, about 60% lower than in the Policy U-U, thanks to the differentiated and coherent intra-region control actions. 680 Moreover, Fig. 7 (D-D policy, with α = 1) shows that the control actions vary from region to region as well as the time evolution of the Threatened cases, but the trend now differs from the previous policies because also the inter-regional control actions are applied. In this case, both intra-region and inter-region control actions are coherently applied depending on the specific pandemic evolution occurring in each 685 region and depending on its economic framework. It is also important to remark that the overall optimal value of the objective function in problem (38) is equal to 171.70, which is the lowest value among the considered policies. In effect, the differentiated intra-and inter-region control actions are coherently set, thus leading to the minimum objective function value for the regional healthcare and economic system. 690 However, we remark that the emerged finding is an expected outcome. Indeed, the D-D policy presents the least restricted control action set with respect to the D-U and U-U policies, thus leading to a global objective function value that is lower than or equal to that of the other policies (i.e., from a global perspective, the D-D policy generally outperforms the other two policies). Nevertheless, since the objective func-695 tion (37) is composed by the weighted summation of multiple terms, this conclusion is no more valid when analyzing and comparing the single contributions of the objective function for the three policies. Consequently, the various policies have to be evaluated using further performance indicators different from the composite objective function value, such as the average number of Threatened people and the duration of the control 700 actions. Such a comparative analysis is indeed very useful to policy-makers in supporting the choice of the most suitable strategy to be implemented. As matter of fact, the selection of the best performing policy (i.e., the D-D) is not so obvious due to various contributing reasons related to the complexity of the decision making context. As a consequence, quantifying the gap between the best performing policy and the others 705 and comparing the different policies from various individual points of view actually can help the policy-makers in deeply analyzing and choosing the most effective action by taking into account the different restrictions that could be implemented in a country or a region. Therefore, in order to effectively evaluate the different control policies, in Tables • the average number of Threatened individuals over the control horizon for all the regions and the whole Italy;

• the duration of the lockdown (measured in days) for all the regions;

• the duration of the partial lockdown (measured in days) for all the regions;

• the total number of control action switches, both for intra-and inter-region control actions;

• the duration of the border closure (measured in days) for all the regions;

• the overall cost of the policies over the control horizon for all the regions and the 720 whole Italy.

By analyzing Table 1 , it can be observed that, if a similar importance weight is assigned to both the intra-and inter-region actions (i.e., for α = 1), then the average number of Threatened people assumes the lowest value with the U-U policy, thus making it preferable from a social perspective, whereas the average duration of the 725 lockdown is the highest with the lowest number of control actions switches. Similar findings arise when analyzing Tables 2 and 3, which respectively refer to inter-region actions that are twice (i.e., α = 2) and half (i.e., α = 0.5) important than intra-region ones. As a result, we can conclude that, although the D-D policy, thanks to differentiated actions on intra-region activity and inter-region mobility restrictions, ensures the lowest global objective function value, it does not guarantee the lowest number of Threatened cases. Figure 6 : Policy D-U , α = 1: Threatened cases (blue line), intra-region activity (red line) and inter-region travel (green dotted line) restrictions, the maximum number of Threatened cases that can be treated by the region (black line). For each region, the x-axis reports the time in days, the left y-axis represents the number of Threatened individuals, and the right y-axis represents the value of the control actions. 

In this section, we introduce some reference strategies to compare the examining the effectiveness of the proposed optimal control approach. In particular, three simple 735 benchmark strategies are defined:

• Benchmark control strategy 1: No restrictions are applied for the whole simulation period, ignoring the impact on the healthcare system.

• Benchmark control strategy 2: All restrictions are applied for the whole simulation period, i.e., all the intra-region and inter-region restrictions are used at 740 their maximum value to control and flatten the number of cases, ignoring any economic impact.

• Benchmark control strategy 3: At the beginning of each week, all restrictions are applied in a single region if the number of Threatened is higher than a safety threshold.

Note that the first two strategies represent two ideal extreme cases, since they correspond to take only the economic or the health perspective into account, respectively. Instead, the third strategy is a simple but at the same time realistic feedback control strategy to mitigate the COVID-19 outbreak. Figures 8-10 show the results obtained for the three benchmark control strategies. The blue line represents the time evolu-750 tion of the Threatened cases, the red line represents the time evolution of the intraregion control actions, and the green dotted line represents the time evolution of the inter-region travel restrictions, while the black line represents the maximum number of Threatened cases that can be treated by the i-th region (i.e., T max i ). In Fig. 8 we show the results of benchmark control strategy 1, i.e., the case where 755 no restrictions are applied and the control actions at each week are null (u i (k) = 0 and r i (k) = 0 for each region i for each time k). The reported graph highlights that with the current model parameters, applying no control actions would result in an overload of the national healthcare system in almost all Italian regions. At the same time, this solution corresponds to the minimization of only the last two terms of (37), i.e., only 760 the economic part.

Conversely, in Fig. 9 we show the results of benchmark control strategy 2, i.e., the case where all the restrictions are applied at their maximum value (u i (k) = 0.8 and r i (k) = 1 for each region i at each time k of the simulation period). In this case the number of Infected falls to zero; however, the economic cost is the highest (i.e., 765 19603.17).

Finally, in Fig. 10 , we show the results of benchmark control strategy 3. In this approach, the control actions are applied when the number of Threatened cases is higher than a predetermined threshold. More in detail, at the end of each week, if the Threatened cases are higher than the threshold level, all the measures (intra and inter-region 770 control actions) are applied in the subsequent week. This strategy is similar to the protocol implemented by the Italian government [33] . In fact, the current Italian protocol considers safety thresholds on the number of infected and Threatened. When these thresholds are exceeded in a specific region, the restrictive measures are applied to reduce the number of cases. The threshold directly influences the total economic cost and 775 the number of Threatened people. In fact, a high threshold leads to a large number of Threatened cases that may exceed the maximum capacity. Conversely, a low threshold would keep the number of cases low by having a higher economic cost. In our simulation, we assume a safety level for all the regions equal to 0.8T max i ; this results in an economic cost of 830.33., i.e., only the last two terms of (37). Moreover, the figure   780 shows that this strategy is ineffective in containing the number of Threatened cases. Figure 9 : Benchmark control strategy 2: Threatened cases (blue line), intra-region activity (red line) and inter-region travel (green dotted line) restrictions, and the maximum number of Threatened that can be treated by the region (black line). For each region, the x-axis reports the time in days, the left y-axis represents the number of Threatened, and the right y-axis represents the value of the control actions. Figure 10 : Benchmark control strategy 3: Threatened cases (blue line), intra-region activity (red line) and inter-region travel (green dotted line) restrictions, and the maximum number of Threatened that can be treated by the region (black line). For each region, the x-axis reports the time in days, the left y-axis represents the number of Threatened, and the right y-axis represents the value of the control actions.

We now compare the results of the MPC approach (Section 4.2) with respect to the above results. Disregarding the difference between the three different policies (i.e., U-U, D-U, and D-D), the objective function value is always significantly lower than that of all the benchmark control strategies, due to different factors as detailed in the 785 sequel.

Comparing the proposed MPC results with the benchmark control strategy 1, it is apparent that, since no restrictions are applied, the number of Threatened largely exceeds the maximum limit in all the regions in the latter case. Hence, although the economic cost is null, the global objective function value is much higher than that of 790 the proposed MPC approach.

Referring to the benchmark control strategy 2, the number of Threatened is kept at the lowest value. Nevertheless, since the economic aspect is ignored, the last two parts of the objective function largely arise, thus leading to a higher global objective function value with respect to that of the presented MPC method.

The comparison between the proposed MPC and the benchmark control strategy 3 is of particular interest. The latter shows an economic cost, which is computed by taking into account only the last two parts of the objective function, four times higher than that of the D-D policy and two times higher than that of the U-U one. In addition, since the benchmark strategy 3 is unable to keep the number of Threatened below the 800 maximum limit in all the regions, the first part of the objective function makes the total cost much higher than the proposed MPC approach.

Summing up, from the above simulations it arises that the proposed optimal control approach provides the best compromise between the two most important governmental aspects (that is, the economic and the social features of interventions), being able to 805 keep the number of Threatened cases below a maximum limit while minimizing the economic cost of the eventually required lockdown periods. Consequently, the proposed methodology represents a useful support tool for policy-makers to mitigate the COVID-19 outbreak in case of secondary pandemic waves. 810 In this paper, we propose a novel feedback control strategy aimed at supporting policy-makers in efficiently mitigating the effects of COVID-19 pandemic contagions in multi-region areas, such as Italy, where the contagion peak has been reached and, in a post-lockdown phase, coordinated regional restarting strategies are needed.

The presented methodology makes joint use of an epidemiological SIR-based model

(namely a SIRQTHE model) in conjunction with a non-linear Model Predictive Control (MPC) approach, with the final aim of minimizing the cost of the adopted mitigation strategies, while ensuring that the capacity of the regional healthcare systems is not violated. First, the SIRQTHE model allows to consider seven compartments of individuals (that is, Susceptible,Infected, Removed, Quarantined, Threatened, Healed, and 820 Extinct), thus ensuring a detailed representation of the pandemic dynamics, which is further guaranteed thanks to the definition of time-varying parameters. In addition, the multi-region framework allows to simultaneously take into account both the actions taken at a national level in terms of border activities between the regions and the specific strategies undertaken at each regional level. On the one hand, the proposed 825 approach fills a gap in the existing literature, where in reference to a multi-region scenario there is a lack of investigations on optimal control approaches aimed at effectively coping with possible onset of further epidemic waves, while efficiently returning economic activities to the standard level of intensity. On the other hand, the application to the network of Italian regions based on real data-sets highlights the planning utility 830 and flexibility of the proposed MPC approach in determining differentiated, as well as coordinated, optimal control actions over all the given regions. An additional merit of the developed approach is in its generalizability to different levels of spatial scale. Whilst in the presented numerical experiments we choose the network of Italian regions as the multi-region area under control, an interesting devel-835 opment of the research can be to apply the MPC approach to determine the differentiated, as well as coordinated, optimal control policies of higher-and lower-granularity networks, such as, for instance, the counties in the European Union or the districts within the same region.

Nonetheless, this study is not without limitations, which still need to be investi-840 gated in future works. In particular, the main limitation of the proposed framework relies on the centralized computation of the regional optimal control policies. Due to economical, political, and societal reasons, a centralized control approach may not be appealing in the epidemic control of a multi-region framework. At the same time, since all the regions are coupled by states and inputs, the cooperation between regions should 845 be encouraged to improve the system-wide performance through the avoidance of uncoordinated behavior of individual regions. Hence, the optimization of the regional strategies may be preferably performed through a cooperative distributed framework.

Therefore, our future work will mainly be devoted to define an iterative mechanism to determine the regional optimal control policies in a cooperative distributed setting.

Finally, one may observe that results and implications are derived from a simple model of control and mitigation actions, since we consider a finite set of intra-region activity and inter-region mobility restrictions producing additive effects on the reduction of infection. Actually, this limitation is only apparent, since the proposed model can be easily generalized to more complex cases by adding other terms in the objective 855 function and constraints to deal with the eventual finer control and mitigation actions producing combined non-linear effects on the analyzed epidemic dynamics.

In this Appendix we explain how we calibrate the SIRQTHE model for the considered numerical experiments. In fact, the genuineness of any model depends on the 860 number and quality of raw data adopted in the selection of variables to be used and parameters to be calibrated. Therefore, any model should rely as much as possible on the available data and on the prior knowledge of the system to model.

The Italian State body largely responsible for the management of natural disasters 865 and catastrophes is the Protezione Civile [32] (i.e., the Italian Civil Protection Department) [32] . During a pandemic emergency, the Protezione Civile is responsible for collecting and elaborating all the data related with the pandemic. More in detail, several data (and analyses) on a daily basis are available for the COVID-19 pandemic [29] :

• total cases: cumulative number of people infected, also comprehending healed and dead;

• hospitalized in a non-critical situation: number of people that are identified as currently infected and as such they are hospitalized, although their situation is not critical;

• hospitalized in a critical situation: number of people that are currently hospitalized in an Intensive Care Unit because of their severe situation;

• in quarantine: number of people recognized as infected; however, due to their mild or absent symptoms, they are legally obliged to stay in isolation at home;

• healed: number of people recognized as infected, but dismissed because healed;

• deceased: number of people dead because of COVID-19;

• swabs: number of swabs made by the national healthcare system.

Besides the above data, in order to perform a realistic and fruitful selection of the model parameters for the whole Italian outbreak (i.e., from February until to the post-lockdown restarting phase in June) we should not neglect that all parameters are 885 variable and time-dependent, especially during the first spread of the virus in a country.

In the literature, a pragmatic and widely adopted approach to overcome this limitation consists in fitting the model by employing different time windows, which reflect the different stages of the epidemic, and to set the parameters constant within each of these periods [37] . This leads to an acceptable fitting only when additional information 890 is included in the fitting phase. For instance, it is possible to add constraints on the parameters based on the available scientific knowledge.

Conversely, in this paper, we fit the model by assuming that the required parameters can be predicted with some time-dependent functions. Moreover, the following assumptions are made on the parameters:

• β i (k) (i.e., the infection rate of region i at time k): in the related literature (see, e.g., the work in [35] ), this parameter typically ranges between 0.35 and 0.55 in the absence of social distancing policies and people awareness. However, this value highly decreases during lockdown periods. In our work, to perform a continuous fitting of the COVID-19 from March to June (that is, during the post-900 lockdown phase), we correlate the evolution of β with the evolution of people's mobility computed on the basis of the daily Google mobility reports [34] , aggregated on a weekly basis. Therefore, the infection rate at the week w is defined as:

where β 0 i is the infection rate in region i when no containment measures are 905 applied, M i (·) is the weekly mobility level (i.e., M i (·) is a function of weeks and it ranges between 0 and 1) in region i, and d i is the delay of effects that mobility has on the contagion in region i. Note that, since the effects of social restrictions are not immediate, after a sensitivity analysis on the available data we consider a delay of one week in all the regions. Hence, we set d i = 1, ∀i ∈ M. Also note 910 that the use of a different β 0 i for each region would help the model fitting model, since β 0 i is likely to increase in the regions with a higher population density (we refer the interested reader to the work in [38] ). Nevertheless, for the sake of limiting the number of parameters, we set for all the Italian regions a uniform value for this parameter: β 0 i = β 0 , ∀i ∈ M.

• γ and δ (i.e., the rate of healing of unrecognized infected people who do not need hospitalization and the rate of healing of Quarantined people who do not need hospitalization, respectively): the value assigned to these parameters can be approximated by a constant since currently there is no proof that the virus has mutated. In theory, these two parameters have different values: in several 920 countries, someone may leave the quarantine only after two negative swabs, i.e., a person may be forced to be in quarantine even after clinically healed. Nevertheless, for the sake of reducing the model parameters in the identification phase, we assume, without loss of generality, that these two parameters equal. In particular, the literature findings show that setting γ = 1/14 is appropriate (see, e.g., 925 the work in [39] ) and therefore we assume this value for all the regions.

• θ i (i.e., the rate of infected people that are recognized and Quarantined of region i): this parameter is mainly related with the specific policy adopted by each region and the number of laboratory testing capacities (e.g., in terms of tested swabs). However, when the laboratory limits were reached at the beginning of 930 March the number of tested swabs became constant. Therefore, we assume this parameter as time-independent, but variable from region to region.

• λ and µ (i.e., the rate of people recognized only when strong symptomatic conditions occur and the rate of Quarantined people to be hospitalized, respectively): both parameters represent the rates of people that need to be hospitalized. As 935 these parameters are only related with the virus nature, we assume they are both constant and equal for all the regions.

• π i (k) (i.e., the recovery rate of region i at time k): this parameter is far from being constant during the spread of COVID-19. The national healthcare system may not be prepared and does not have therapeutic procedures for patients with 940 symptoms that have never been seen before. This means that initially the evolution of the number of recoveries can be slow until a constant value is reached, i.e., when the healthcare system becomes ready and prepared. In particular, on the basis of the performed analyses, we impose the parameter π to be time-dependent with the following formulation for each region:

• ε i (k) (i.e., the death rate of region i at time k): the number of deaths is not constant and hopefully decreases with time. This is mainly due to the availability of new clinical treatments. Furthermore, at the beginning of an epidemic, when the screening of infected people is low, only the sever symptoms cases are recognized and treated. With the ongoing of the pandemic, more cases are recognized 950 as infected, and hence the percentage of severe symptoms cases becomes lower. Therefore, taking into account the performed analyses, we set parameter ε i as time-dependent with the following formulation for each region:

ε i (k) = a 4,i + exp (−a 5,i (k + a 6,i )) (A.3)

• ξ i,j (k) (i.e., the inter-region mobility coefficient between regions i and j at time k): these parameters are adapted from [35] due to the different model assump-955 tions; in detail, the aforementioned work considers the interaction between the people between different regions. In contrast, we consider physical migration between different classes.

In order to estimate the parameters for the Single-region SIRQTHE model, we adopt 960 a pragmatic approach with a least-squares optimization technique based on the real data combined with hard constraints to enforce the prior knowledge on the system.

The containment measures of the pandemic in Italy can be divided into two main phases, one following February 23, which mainly concerned Northern Italy, and a second one following March 09, including the more restrictive measures affecting the 965 whole national territory. The first restrictive measures, which comprehend the closure of schools, universities, and bars and restaurants after 6 p.m., had limited effects on the contagion dynamics and, as the crisis worsened, the need for more severe restrictions motivated the second phase, which turned into a total lockdown where all the non-essential production activities were shut down. Therefore, in this phase, the Ital-970 ian scenario turns into a network of isolated regions, corresponding to the set of M Single-region SIRQTHE models. These models are fitted through the data related to the period when the movements between regions were not permitted in Italy (i.e., from March 15 to May 31). Each regional decoupled model is composed by seven equations and seven parameters, which are reduced to six equations and six parameters since the 975 Removed are calculated in accordance with (4) adapted to the SIRQTHE discrete-time seven-compartments model. Since two of these equations are highly non-linear, the fitting procedure is highly non-convex; indeed, as proposed in [16] , improved results can be obtained by splitting each regional model into two simpler sub-models and thus performing a two-stage fitting procedure.

In the first stage we analyze the following sub-model:

where C(k) = Q(k) + T (k) + H(k) + E(k) is the cumulative number of Infected people. It is clear that β 0 , τ = θ + λ, and the initial conditionĨ(0) are the only parameters to be estimated. The estimation of such parameters consists in minimizing the mean squared error (MSE) of the model with respect to the real data, which is defined as: In the second stage, we analyze the following sub-model:

T (k + 1) = T (k) + µ Q(k) + λ I(k) − (π(k) + ε(k)) T (k) (A.9)

H(k + 1) = H(k) + γ Q(k) + π(k) T (k) (A.10)

where we assume that the parameters π(k) and ε(k) are time-dependent. The estimation of the parameters of the Multi-region SIRQTHE model follows a two-stage fitting procedure similar to that described in Appendix A.2, except for the differences highlighted in the sequel. First, we note that, since the multi-region model takes the migrations between regions into account, the corresponding fitting window includes also periods when the inter-region borders are opened in Italy (i.e., from February 29 to June 3). Second, instead of individually identifying the parameters of independent single-region sub-models (A.4)-(A.6), we simultaneously fit the parameters of the entire network of regional sub-models coupled by the migration coefficients ξ i,j (k). These coefficients are adapted from [16] : in particular, the authors in [16] use two sets of average number of people migrated from region i to region j, respectively referred the lockdown and post-lockdown phases; in our model we normalize the data used in [16] with respect to the regional population in order to determine the migration rate ξ i,j (k) (∀i = j ∈ M) and we compute ξ i,i (k) (∀i ∈ M) through (19) . 

Mathematical assessment of the impact of nonpharmaceutical interventions on curtailing the 2019 novel coronavirus

First-wave covid-19 transmissibility and 1025 severity in china outside hubei after control measures, and second-wave scenario planning: a modelling impact assessment

A new view of multiscale stochastic impulsive systems for modeling and control of epidemics

Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy

1035 Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand

Can the COVID-19 epidemic be controlled on the basis of daily test reports?

On fast multi-shot epidemic interventions for post lock-down mitigation: Implications for simple covid-19 models

The mathematics of infectious diseases

Analysis and control of epidemics: 1045 A survey of spreading processes on complex networks

On the dynamics of deterministic epidemic propagation over networks

Unraveling r 0: Considerations for public health applications

A modified SIR model for the COVID-19 contagion in Italy

Spread and dynamics of the covid-19 epidemic in italy: Effects of emergency containment measures

Modeling the epidemic dynamics and control of COVID-19 1060 outbreak in China

Intermittent yet coordinated regional strategies can alleviate the COVID-19 epidemic: A network model of the Italian case

Expected impact of 1065 school closure and telework to mitigate covid-19 epidemic in france

A multi-region variant of the SIR model

Ro-1070 bust and optimal predictive control of the COVID-19 outbreak

Dynamic control of modern, network-based epidemic models

Vaccination models and optimal control strategies to dengue

Mathematical modelling, simulation, and optimal control of the 2014 ebola outbreak in west africa

Optimal control for a tuberculosis model with reinfection and post-exposure interventions

Applying optimal control theory to complex epidemiological models to inform real-world dis-1085 ease management

Model predictive control design for linear parameter varying systems: A survey

Model predictive 1090 control for thermal comfort optimization in building energy management systems

COVID-19: From model prediction to model predictive control

Robust economic model predictive control of continuous-time epidemic processes

The Civil Protection Department COVID-19 dashboard

Discrete-time vs. continuous-time epidemic models in networks

Presumed asymptomatic carrier transmission of covid-19

The Civil Protection Department website

Intermittent yet coordinated regional strategies can alleviate the covid-19 epidemic: a network model of the italian case

On fast multi-shot epidemic interventions for post lock-down mitigation: Implications for simple covid-19 models

The scaling of contact rates with popula-1125 tion density for the infectious disease models

The challenges of modeling and forecasting the spread of covid-19

The Italian Ministry of Health

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.