key: cord-1015070-ihjeblqr authors: Gallo, Luca; Frasca, Mattia; Latora, Vito; Russo, Giovanni title: Lack of practical identifiability may hamper reliable predictions in COVID-19 epidemic models date: 2022-01-19 journal: Science advances DOI: 10.1126/sciadv.abg5234 sha: e1e1c0568b5f07dd82c1568a5b43cea58ff20388 doc_id: 1015070 cord_uid: ihjeblqr Compartmental models are widely adopted to describe and predict the spreading of infectious diseases. The unknown parameters of these models need to be estimated from the data. Furthermore, when some of the model variables are not empirically accessible, as in the case of asymptomatic carriers of coronavirus disease 2019 (COVID-19), they have to be obtained as an outcome of the model. Here, we introduce a framework to quantify how the uncertainty in the data affects the determination of the parameters and the evolution of the unmeasured variables of a given model. We illustrate how the method is able to characterize different regimes of identifiability, even in models with few compartments. Last, we discuss how the lack of identifiability in a realistic model for COVID-19 may prevent reliable predictions of the epidemic dynamics. The pandemic caused by severe acute respiratory syndrome coronavirus-2 is challenging humanity in an unprecedented way (1) , with the disease, which in a few months has spread around the world, affecting large parts of the population (2, 3) and often requiring hospitalization or even intensive care (4, 5) . Mitigating the impact of coronavirus disease 2019 (COVID-19) urges synergistic efforts to understand, predict, and control the many, often elusive, facets of the complex phenomenon of the spreading of a previously unknown virus, from RNA sequencing to the study of the virus pathogenicity and transmissibility (6, 7) to the definition of suitable epidemic spreading models (8) and the investigation of nonpharmaceutical intervention policies and containment measures (9) (10) (11) (12) . In particular, a large number of epidemic models have recently been proposed to describe the evolution of COVID-19 and evaluate the effectiveness of different counteracting measures, including social distancing, testing, and contact tracing (13) (14) (15) (16) (17) (18) (19) . However, even the adoption of well-consolidated modeling techniques, such as the use of mechanistic models at the population level based on compartments, poses fundamental problems. First of all, the very same choice of the dynamical variables to use in a compartmental model is crucial; as such, variables should adequately capture the spreading mechanisms and need to be tailored to the specific disease. This step is not straightforward, especially when the spreading mechanisms of the disease are still unknown or only partially identified. In addition, some of the variables considered might be difficult to measure and track as, for instance, in the case of COVID-19, it occurred in the number of individuals showing mild or no symptoms. Second, compartmental models, usually, involve a number of parameters, including the initial values of the unmeasured variables, which are not known and need to be estimated from data. Having at disposal a large amount of data, unfortunately, does not simplify the problem of parameter estimation and prediction of unmeasured states. Once a model is formulated, it may occur that some of its unknown parameters are intrinsically impossible to determine from the measured variables or that they are numerically very sensitive to the measurements themselves. In the first case, it is the very same structure of the model to hamper parameter estimation as the system admits infinitely many sets of parameters that fit the data equally well; for this reason, this problem is referred to as structural identifiability (20, 21) . In the second case, although under ideal conditions (i.e., noise-free data and error-free models) the problem of parameter estimation can be uniquely solved for some trajectories, it may be numerically ill conditioned, such that from a practical point of view, the parameters cannot be determined with precision even if the model is structurally identifiable (22) . This situation typically occurs when large changes in the parameters entail a small variation of the measured variables, such that two similar trajectories may correspond to very different parameters (23) . The term practical identifiability is adopted in this case. Identifiability in general represents an important property of a dynamical system, as in a nonidentifiable system, different sets of parameters can produce the same or very similar fits of the data. Consequently, predictions from a nonidentifiable system become unreliable. In the context of epidemics forecasting, this means that even if the model considered is able to reproduce the measured variables, a large uncertainty may affect the estimated values of the parameters and the predicted evolution of the unmeasured variables (24) . The problem of practical identifiability of model parameters has been investigated using different methodologies based on Fisher's information theory (25, 26) , profile likelihood (27) , Monte Carlo simulations (28) , and other computational approaches (29) . However, the lack of practical identifiability can also affect the reliability of the prediction of the unmeasured variables dynamics (27) , an issue of utmost importance in the context of COVID-19, which nevertheless still requires a systematic investigation. In particular, an approach to simultaneously characterize the problem of sensitivity to parameters and that of the reliability of predictions of unmeasured variables is still missing. In more detail, in this paper, we investigate the problem of the practical identifiability of dynamical systems whose state includes not only measurable but also hidden variables, as is the case of compartment models for COVID-19 epidemic. We present a general framework to quantify not only the sensitivity of the measured variables of a given model on its parameters but also the sensitivity of the unmeasured variables on the parameters and on the measured variables. This will allow us to introduce the notion of practical identifiability of the hidden variables of a model. As a relevant and timely application, we show the variety of different regimes and levels of identifiability that can appear in epidemic models, even in the simplest case of a four compartment system. Last, we study the actual effects of the lack of practical identifiability in more sophisticated models introduced for COVID-19. Consider the n-dimensional dynamical system described by the following equations where we have partitioned the state variables into two sets: the variables m ∈ ℝ nm that can be empirically accessed (measurable variables) and those h ∈ ℝ nh , with n m + n h = n, that cannot be measured (hidden). The dynamics of the system is governed by the two Lipschitzcontinuous functions f and g, which also depend on a vector of structural parameters q ∈  q ⊂ ℝ nq . The trajectories m(t) and h(t) of system in eq. 1 are uniquely determined by the structural parameters q and by the initial conditions m(0) = m 0 , h(0) = h 0 . Here, we assume that some of the quantities q are known, while the others are not known and need to be determined by fitting the trajectories of measurable variables m(t). We denote by p ∈  p ⊂ ℝ np , the set of unknown parameters that identify the trajectories, which comprises the unknown terms of q and the unknown initial conditions h 0 . The initial values of the hidden variables are not known and act indeed as parameters for the trajectories generated by system in eq. 1. The initial conditions of the measurable variables m 0 may be considered fitting parameters as well. System in eq. 1 is said to be structurally identifiable when the measured variables satisfy (21) for almost any p ∈  p . Notice that, as a consequence of the existence and uniqueness theorem for the initial value problem, if system in eq. 1 is structurally identifiable, the hidden variables can also be uniquely determined. Structural identifiability guarantees that two different sets of parameters do not lead to the same time course for the measured variables. When this condition is not met, one cannot uniquely associate a data fit to a specific set of parameters or, equivalently, recover the parameters from the measured variables (23) . Structural identifiability, however, is a necessary but not sufficient condition for parameters estimation, so that when it comes to use a dynamical system as a model of a real phenomenon, it is fundamental to quantify the practical identifiability of the dynamical system. To do this, we consider a solution, m (t ) = m(t, p ̄ ) and h ̄ (t ) = h(t, p ̄ ) , obtained from parameters p = p ̄ , and we explore how much the functions m(t) and h(t) change as we vary the parameters p ̄ by a small amount p. To first order approximation in the perturbation of the parameters, we have m = m _ p p + O(∥ p ∥ 2 ) and h = h _ p p + O(∥ p ∥ 2 ) . Hence, by dropping the higher order terms, we have where the entries of the sensitivity matrices M = M( p ) ∈ ℝ n p × n p and H = H( p ) ∈ ℝ n p × n p for the measured and unmeasured variables are defined as Note that these matrices are positive semidefinite by construction. The smallest change in the measured variables m(t) will take place if p is aligned along the eigenvector v 1 of M corresponding to the smallest eigenvalue  1 (M). Hence, we can consider  = √ _  1 (M) to quantify the sensitivity of the measured variables to the parameters. Practical identifiability requires high values of  as these indicate cases where small changes in the parameters may produce considerable variations of the measurable variables, and therefore, the estimation of the model parameters from fitting is more reliable. Suppose now we consider a perturbation, p 1 , of the parameters aligned along the direction of v 1 . We can evaluate the change in h(t) due to this perturbation by The value of  quantifies the sensitivity of the hidden variables to the parameters of the model, when these parameters are estimated from the fitting of the observed variables since ∥h ∥ =  ∥ p 1 ∥. Notice that in this case, and differently from , lower values of  are desirable because they imply a better prediction on the hidden variables. Last, with the help of the sensitivity matrices defined above, we can also evaluate the sensitivity of the hidden variables to the measured variables as This parameter is of particular relevance here, since it provides a bound on how the uncertainty on the measured variables affects the evolution of the hidden variables. In addition, the parameter  2 can be efficiently computed as it corresponds to the maximum generalized eigenvalue of matrices (H, M), as shown in Materials and Methods. The sensitivity matrices are useful in studying the effect of changing the number of hidden variables and unknown parameters on the practical identifiability of a model. Assume that we have access to one more variable, thus effectively increasing the size of the set of measured variables to n m ′ = n m + 1 and, correspondingly, reducing that of the unmeasured variables to n h ′ = n h − 1. This corresponds to considering new variables m´ and h´. From the definition in Eq. 3, the new sensitivity matrix can be written as M′ = M + M 1 , where M 1 is the sensitivity matrix for the newly measured variable. Given Weyl's inequality [page 239 of (30)], we have that  1 (M ′ ) ≥  1 (M) +  1 (M 1 ) and since M 1 is also positive semidefinite,  1 (M ′ ) ≥  1 (M). This means that measuring one further variable (or more than one) of the system increases the practical identifiability of a model, as expected. As H ′ = H − M 1 , it is also possible to demonstrate that (M ′ ) ≤ (M) (see Materials and Methods). Let us now consider a different scenario: Suppose we have a priori knowledge of one of the model parameters so that we do not need to estimate its value by fitting the model to the data. In this case, we can define new sensitivity matrices ˜ M , ˜ H ∈ ℝ ( n p −1)×( n p −1) for the measured and unmeasured variables, respectively. Given the Cauchy's interlacing theorem [page 242 of (30)], we have that  1 ( ˜ M ) ≥  1 (M) , which implies that practical identifiability is improved by acquiring a priori information on some of the model parameters. For instance, in the context of COVID-19 models, one may decide to fix some of the parameters, such as the rate of recovery, to values derived from medical and biological knowledge (24, (31) (32) (33) and to determine from fitting the more elusive parameters, such as the percentage of asymptomatic individuals or the rates of transmission. The sensitivity measures we have introduced point out that prior knowledge of some of the parameters, or a larger set of measurable variables, reduces the sensitivity of the measured variables to the parameters and that of the hidden variables to a variation in the measured ones. However, gathering further knowledge can be difficult or even not possible so that these results, which should not be interpreted as an oversimplified solution to the problem of identifiability, have to be considered in the light of practical issues that might arise in the measurement of the model variables and parameters. The sensitivity measures reveal different regimes of identifiability As a first application, we study the practical identifiability of a four compartment mean-field epidemic model (34) in the class of SIAR models (35) , developed to assess the impact of asymptomatic carriers of COVID-19 (8, 36, 37) and other diseases (38) (39) (40) . In such a model (Fig. 1) , a susceptible individual (S) can be infected by an infectious individual who can either be symptomatic (I) or asymptomatic (A). The newly infected individual can either be symptomatic (S → I) or asymptomatic (S → A). Furthermore, we also consider the possibility that asymptomatic individuals develop symptoms (A → I), thus accounting for the cases in which an individual can infect before and after the onset of the symptoms (41) . Last, we suppose that individuals cannot be reinfected as they acquire a permanent immunity (R). One of the crucial aspects of COVID-19 is the presence of asymptomatic individuals who are difficult to trace as the individuals themselves could be unaware about their state. Consequently, we assume that the fraction of asymptomatic individuals, a(t), is not measurable, while the fractions of symptomatic, (t), and recovered, r(t), are measured variables, that is, m ≡ [, r] and h ≡ [s, a]. As mentioned above, practical identifiability is a property of the trajectories of the system, which are uniquely determined by the values of the unknown parameters p. Here, we illustrate how the sensitivity of both measured and unmeasured variables changes with the probability  that a newly infected individual shows no symptoms when all the other parameters of the model are fixed (to the values reported in Materials and Methods). Concerning the choice of vector p, here, we consider the following case. First, as the number of symptomatic infectious and recovered individuals are supposed to be measurable, we have assumed the initial conditions (0), r(0), and the recovery rate of the symptomatic individuals, i.e.,  IR ,to be known quantities. Second, we assume to be able to measure, for instance, through backward contact tracing, the rate at which asymptomatic individuals develop symptoms, i.e.,  AI . Hence, the parameters to be determined are the remaining ones, i.e., p = [a(0),  I ,  A , ,  AR ]. Figure 2A shows a nontrivial nonmonotonic dependence of our sensitivity measures,  and , on . The value of  has a peak at  = 0.51, in correspondence of which  takes its minimum value. This represents an optimal condition for practical identifiability, as the sensitivity to parameters of the measured variables is high, while that of the unmeasured ones is low, and this implies that the unknown quantities of the system (both the model parameters and the hidden variables) can be estimated with small uncertainty. On the contrary, for  = 0.86, we observe a relatively small value of  and a large value of , meaning that the measured variables are poorly identifiable, and the unmeasured variables are sensitive to a variation of parameters. This is the worst situation in which the estimated parameters may substantially differ from the real values, and the hidden variables may experience large variations even for small changes in the parameters. Furthermore, the quantity , which measures the sensitivity of the hidden variables to the measured ones, reported in Fig. 2B , exhibits a large peak at the value of  for which  is minimal. This is due to the fact that the vector that determines  is almost aligned with v 1 . When this holds, we have that  = η/, which explains the presence of the spike in the  curve. Similarly, the sensitivity  takes its minimum almost in correspondence of the maximum of . The behavior of the model for  = 0.86 is further illustrated in Fig. 2C , where the trajectories obtained in correspondence to the unperturbed values of the parameters, i.e., m(t, p) and h(t, p) (solid lines), are compared with the dynamics observed when p undergoes a perturbation with ∥p∥ = 0.3∥p∥ along v 1 (dashed lines). The small sensitivity  of the measured variables (t, p) and r(t, p) to parameters is reflected into perturbed trajectories that remain close to the unperturbed ones, whereas the large sensitivity  of the unmeasured variables s(t, p) and a(t, p) yields perturbed trajectories that significantly deviate from the unperturbed ones. We now illustrate the different levels of identifiability that appear in the SIAR model for diverse settings of the parameters. Its analysis, in fact, fully depicts the more complete perspective on the problem of practical identifiability offered by simultaneously inspecting the sensitivity measures,  and . As the two sensitivity measures are not necessarily correlated, there can be cases for which a high identifiability of the measured variables to the parameters, i.e., large values of , corresponds to either a low or a high identifiability of the hidden variables to the parameters. Analogously, for other system configurations, in correspondence of small values of , namely, to nonidentifiable parameters, one may find large values of , meaning that the hidden variables are nonidentifiable as well or, on the contrary, small values of , indicating that the hidden variables are poorly sensitive to parameter perturbations. Together, four distinct scenarios of identifiability can occur, and all of them effectively appear in the SIAR model ( Fig. 3) : (A) low identifiability of the model parameters p and high identifiability of the hidden variables h, (B) high identifiability of both p and h, (C) low identifiability of both p and h, and (D) high identifiability of p and low identifiability of h. To illustrate them, we have considered four distinct configurations of the model (with parameters as given in Table 3 and illustrated in Materials and Methods) and, for each case study, compared the unperturbed trajectories to the perturbed ones, with the vector of parameters undergoing a variation ∥p ∥ = 0.3 ∥ p∥ along v 1 . As regard cases (A) and (C), we have considered the vector of parameters to determine to be p = [(t), a(0), r(t),  I ,  A , ,  IR ,  AR ,  AI ], while for the cases (B) and (D), we have p = [a(0),  I ,  A , ,  AR ], which is the same choice of p adopted in Fig. 2 . Figure 3 shows the results obtained for each parameter configuration. In each panel, the solid lines represent the unperturbed trajectories, while the dashed lines correspond to the perturbed dynamics. In cases (A) and (B), we see that, under the variation p, the perturbed trajectories of the hidden variables remain close to the unperturbed dynamics. Hence, the hidden variables are highly identifiable. Conversely, in cases (C) and (D), the perturbed trajectories substantially differ from the unperturbed dynamics, meaning that the hidden variables are poorly identifiable as they are sensitive to a variation of the model parameters. As concerns the measured variables, in cases (A) and (C), the perturbed trajectories slightly differ from the unperturbed dynamics. Therefore, as the measured variables are insensitive to the perturbation p, the model parameters have a low degree of identifiability. On the other hand, in cases (B) and (D), the perturbation of the parameters significantly affects the trajectories of the measured variables, meaning that the set of parameters reproducing the observed data is more identifiable. Last, Table 1 illustrates the values of the sensitivity measures , , and  for each case. In particular, case (C) represents the worst scenario as the value of  is relatively small, meaning that the model parameters p are poorly identifiable, and the value of  is large, indicating a high sensitivity of the hidden variables to the parameters. Conversely, the best scenario is represented by case (B), for which both the model parameters and the hidden variables are highly identifiable as the value of  is large compared to the other cases, while the value of  remains relatively small. Poorly identifiable models may provide unreliable predictions when the parameters are estimated from data So far, we have illustrated how variations on the parameters affect the trajectories of measurable and hidden variables under different degrees of identifiability. However, a high sensitivity of the hidden variables to measured ones has relevant practical consequences, especially when the parameters are unknown and need to be fitted from data. In these conditions, a small uncertainty in the measurable variables due to the presence of noise in the data can propagate markedly and make the prediction of the hidden variables unreliable. Hence, in this section, we study how the lack of practical identifiability can affect the predictions of the SIAR model when this is fitted to empirical data. The reliability of the model predictions has been investigated by means of two different fitting techniques: a least square error minimization procedure (42) , which provides a point estimate of the parameters, and a Bayesian inference approach (43) , which, conversely, gives an estimate of the probability distribution in the parameter space. To carry out the numerical analysis, we consider the model under the same settings (i.e., fixing the values of the six parameters and the initial values of three variables) as those adopted in case (C) in the previous section, which correspond to the case of low identifiability of both the parameters and the hidden variables. We then generate from such a model a synthetic dataset of trajectories, which we fit using the two approaches mentioned above (see Materials and Methods for further details). All the model parameters are considered unknown and thus need to be determined through the fit. First, we consider the least square error minimization approach. To show how, because of the lack of identifiability of the model, significant variations in the dynamics of the hidden variables can be obtained when fitting the measured variables, we have performed the following analysis. As the estimation procedure (based on a nonlinear optimization algorithm; see Materials and Methods) starts from an initial guess of the fitting parameters, indicated as p 0 , instead of fitting a single set of values, we have repeated the procedure under the very same conditions of the algorithm, for 500 runs, randomly selecting p 0 from a Gaussian distribution centered on a fixed point of the parameter space and with variance equal to 0.25. We then discarded those runs yielding a fitting error d > 0.015, which corresponds to a relative error of 2.5%, thus keeping a total of 65 sets of parameters fitting the measured variables with a similar value of the error. The fact that different sets of parameters are obtained in this way may indicate that the error function has several local minima. Figure 4 (A to D) displays the average trajectories (over the 65 sets of parameters) of the four state variables (solid lines) and the respective regions where 95% of the trajectories lie (shadowed area). While the dynamics of the measured variables (Fig. 4, C and D) produced by the SIAR model are in very good agreement with the data, the same is not true for the temporal evolution of the hidden variables (Fig. 4, A and B) . In particular, the number of newly asymptomatic infected individuals is substantially overestimated. We now consider the Bayesian inference approach. To implement it, we used the Delayed Rejection Adaptive Metropolis (DRAM) (44), a Markov Chain Monte Carlo (MCMC) algorithm, with settings as described in Materials and Methods. In particular, since we have here assumed to have no a priori information on the value of the model parameters, we have considered uniform prior probability distributions, representing the less informative conditions for the model (further details are given in Materials and Methods). Figure 4 (E to H) shows the temporal evolution of the SIAR variables. Solid lines represent the average trajectory obtained by sampling 500 sets of parameters from the posterior distributions, while the shadowed areas indicate the regions where 95% of the trajectories lie. Similarly, to the case of the least square minimization, while the dynamics of the measured variables (Fig. 4 , G and H) is in a good agreement with the synthetic data, the prediction of the hidden variables (Fig. 4 , E and F) is not. At variance with the previous example, the number of newly asymptomatic infected individuals is here largely underestimated. Hence, these results indicate that the lack of identifiability can lead to unreliable results even when a Bayesian approach is adopted. In the analysis presented in this section, we have assumed to have no a priori information on the values of the model parameters. Therefore, when performing the least square error minimization, we have extracted all the parameters from fitting, while, following the same reasoning, we have chosen a uniform prior probability distribution in the Bayesian approach. When instead we have strong a priori knowledge of a disease, this can be used to inform the models, by fixing the values of certain parameters while estimating the others, in the case of the least square error method, or by considering more informative prior distributions, in the case of Bayesian inference. When a priori information on the values of the parameters can be obtained, for instance, through medical and biological studies, the model predictions are expected to become less affected by uncertainty. The analysis of the sensitivity matrices (see also Materials and Methods) confirms this expectation as we have demonstrated that additional knowledge of the parameters, or the measurement of a hidden variable, can reduce the sensitivity to the measured variables, thus improving the reliability of the prediction. However, there are cases in which a priori knowledge is not available, and the analysis of identifiability becomes crucial. As an example, one could estimate the percentage of asymptomatic individuals according to serological surveys or to longitudinal studies. However, these data can be unavailable at an early stage of an epidemic outbreak (45, 46) , preventing their use to inform the epidemiological models. These considerations hallmark once again the need for a synergistic approach to study newly discovered infectious diseases and stress the importance of assessing the reliability of mathematical modeling when the amount of available information is limited. As a second application, we show the relevance of the problem of practical identifiability in the context of COVID-19 pandemic modeling. We consider a realistic model (Fig. 5) of the disease propagation, that is a variant of the SIDARTHE model (16) and is characterized by nine compartments accounting respectively for susceptible (S), exposed (E), undetected asymptomatic (I A ), undetected symptomatic (I S ), home isolated (H), treated in hospital (T), undetected recovered (R u ), detected recovered (R d ), and deceased (D) individuals. Following the study of Giordano et al. (16) , to account for the different nonpharmaceutical interventions and testing strategies issued during the COVID-19 outbreak in Italy (47, 48) , the model parameters have been considered piece-wise constant and estimated using nonlinear optimization by fitting of the official data provided by the Civil Protection Department (49) . As the dataset provides the evolution in time of the daily number of home isolated, hospitalized, detected recovered, and deceased individuals, It is here worth discussing an important issue that concerns the model parameters. We have considered that different policy strategies affect, according to their nature, only specific parameters. In particular, we have assumed that a change in the containment strategy leads to a variation in the transmission rates , while an adjustment in the testing strategy affects the values of the parameters  I S H ,  HT , and  HR d (further details are reported in Materials and Methods). As both the containment and the testing strategies in Italy have frequently changed during the pandemic, most of the parameters to estimate consist of transmission and detection rates. While other parameters, such as the death or the recovery rates, can be derived from the current literature (50) , very limited information is available on the transmission and detection rates that are difficult to measure directly and, therefore, need to be estimated by fitting available data. Hence, as in (16), we have assumed all the model parameters to be unknown. To show how significant variations in the evolution of the hidden variables can arise when fitting the measured variables, we have performed a numerical analysis similar to the one of the previous section. Again, rather than fitting a single set of values, we have repeated the minimization procedure under the same conditions of the algorithm, for 500 runs, randomly selecting the initial guess p 0 from a Gaussian distribution centered on a fixed point of the parameter space, with variance equal to 0.25. We discarded the runs yielding a fitting error e > 900, corresponding to a relative error of 1.4%, thus keeping a total of 40 sets of parameters. Figure 6 shows the dynamical trajectories that we have obtained for each of the 40 sets of parameters (solid lines). Both measured (Fig. 6 , A to D) and hidden (Fig. 6 , E to H) variables are reported. While the time evolution of the measured variables produced by the model is in very good agreement with the empirical data, reported as circles in Fig. 6 , significant differences in the trend of the hidden variables appear. A large variability is observed, confirming that the lack of identifiability yields a high sensitivity of the hidden variables to the measured one. These findings have relevant implications. The large uncertainty on the size of the asymptomatic population makes questionable the use of the model as a tool to decide the policies to adopt. The practical identifiability of a dynamical model is a critical, but often neglected, issue in determining the reliability of its predictions. In this paper, we have introduced a novel framework to quantify: (i) the sensitivity of the dynamical variables of a given model to its parameters, even in the presence of variables that are difficult to access empirically and (ii) how changes in the measured variables affect the evolution of the unmeasured ones. The measures we have proposed are easy to compute and enable to assess, for instance, if and when the model predictions on the unmeasured variables are reliable or not, even in the cases in which the parameters of the model can be fitted with high accuracy from the available data. As we have shown with a series of case studies, practical identifiability can critically affect the predictions of even very refined epidemic models introduced for the description of COVID-19, where dynamical variables, such as the population of asymptomatic individuals, are impossible or difficult to measure. This by no means should question the importance of these models-in that they enable a scenario analysis, otherwise impossible to carry out, and a deeper understanding of the spreading mechanisms of a novel disease-but should hallmark the relevance of a critical analysis of the results that takes into account sensitivity measures. It also highlights the importance of cross-disciplinary efforts that can provide a priori information on some of the parameters, ultimately improving the reliability of a model (8, 24) . A problem related to the one studied in our paper is that of observability, which investigates how to reconstruct the internal state of a system from measurements on the input and output, under the hypothesis that the model and its parameters are known (51, 52) . Techniques based on the observability problem are clearly extremely important and may be applied, for instance, to derive the time evolution of asymptomatic individuals from measurements on infected and recovered individuals, when it is possible to develop a fully observable model with known parameters. The sensitivity matrices considered in this paper are given by where the vector functions m = m(t, p) and h = h(t, p) are obtained integrating system in eq. 1. The derivative of measurable and hidden variables with respect to the parameters p can be obtained by integrating the system where i = 1, …n p . The numerical evaluation of the sensitivity matrices is carried out first by integrating system in eq. 7 (for this step we use a fourth-order Runge-Kutta solver with adaptive step size control), resampling the trajectories with a sampling period of 1 day, and then performing a discrete summation over the sampled trajectories. Moreover, integration is carried out over a finite time interval [0, ], with large enough . In the context of our work, as we have considered SIR (susceptible infected removed)-like epidemic models, we set the value of  such that the system has reached a stationary state, i.e., the epidemic outbreak has ended, as every infected individual has eventually recovered (or dead, depending on the model). We now present an important property of the sensitivity matrices. We will only take into account the set of measured variables m, as similar considerations can be made for the hidden variables. Let us assume to be able to measure only a single variable, so that the vector m collapses into a scalar function, which we call m 1 (t). In this case, the element M ij of the sensitivity matrix would be simply given by Let us call this sensitivity matrix M 1 . Consider now a larger set of measured variables m = (m 1 , m 2 , …, m nm ). The quantity ∂m T /∂p i ∂m/∂p j in Eq. 6 is given by Therefore, integrating over time in the interval [0, ∞ ] and given the linearity property of the integrals, we find that the sensitivity matrix M of the set of the measured variables is given by the sum of the sensitivity matrices of the single measured variables. Formally, we have that M = M 1 + M 2 + … + M n m (10) This property of the sensitivity matrices is useful to demonstrating how measuring a further variable affects the sensitivity measures  and , as discussed in the following subsection and in Results. Last, because matrices M and H are positive semidefinite, their eigenvalues are nonnegative. For any positive semidefinite matrix A of order m, we shall denote its eigenvalues as 0 ≤  1 (A) ≤  2 (A) ≤ … ≤  m (A). Here, we discuss in more detail the sensitivity measures introduced in Results. First, we want to propose a measure to quantify the practical identifiability of the model parameters given the measured variables. To do this, we need to evaluate the sensitivity of the trajectories of the measured variables to a variation of the model parameters. If this sensitivity is small, then different sets of parameters will produce very similar trajectories of the measured variables, meaning that the parameters themselves are poorly identifiable. In particular, as a measure of the parameters identifiability, we can consider the worst scenario, namely, the case in which the perturbation of the parameters minimizes the change in the measured variables. This happens when the variation of the model parameters p is aligned along the eigenvector v 1 of M corresponding to the minimum eigenvalue  1 (M). Given the definition of M, we have that ∥ m ∥ = √ _  1 (M) ∥ p ∥ ; hence, we can consider the quantity as an estimate of the sensitivity of the measured variables to the parameters. Note that, here and in the rest of the paper, ∥v∥ denotes the Euclidean norm of a finite dimensional vector v, ∥v∥ 2 = v · v, while for a function u(t), ∥u∥ denotes the L 2 norm of u in [0, ∞ ], i.e., ∥ u ∥ 2 = ∫ 0 ∞ u · u dt . Let us now focus on the hidden variables h. In general, as the hidden variables are not directly associated to empirical data, the largest uncertainty on the hidden variables is obtained in correspondence of a variation of the parameters along the eigenvector of H associated to the largest eigenvalue, namely,  n p (H). Hence, to quantify the sensitivity of the hidden variables to the parameters, one may consider However, it is crucial to note that the hidden variables ultimately depend on the parameters of the model, which are estimated by fitting data that are available for the measured variables only. As a consequence, it is reasonable to consider a quantity that evaluates how the uncertainty on the model parameters (determined by the uncertainty of the measured variables and by their sensitivity to the parameters) affects the identifiability of the hidden variables. Therefore, as a measure of the sensitivity of the hidden variables to the parameters, we consider where p 1 is a perturbation of the parameters along the eigenvector v 1 of M corresponding to the minimum eigenvalue  1 (M). Note that, when v 1 and the eigenvector of H corresponding to the largest eigenvalue  n p (H) are aligned, by definition, we have  =  MAX . Last, we want to define a quantity to estimate how much the hidden variables are perturbed given a variation of the measured ones. In particular, as a measure of the sensitivity of the hidden variables to the measured variables, we consider the maximum perturbation of the hidden variables given the minimum variation of the measured ones, which is Note that  2 can be computed considering the following generalized eigenvalue problem where H and M are the sensitivity matrices for the hidden and the observed variables respectively, and  k =  k (M, H) denotes the k-th generalized eigenvalue of matrices M and H. We will denote by  n p the largest generalized eigenvalue and u the corresponding generalized eigenvector. Note that, since both matrices are symmetric, if u is a right eigenvector, then u T is a left eigenvector. Multiplying each member of the equation by u T and dividing by u T Mu, we obtain where one can recognize the definition of  2 provided in Eq. 14. In other words,  2 represents the largest eigenvalue of the matrix M −1 H. It is worth noting two aspects about the sensitivity measure . First, given the definitions in Eqs. 11 and 12, for any p with ∥p∥ = 1, we have that  p T Hp ≤  MAX 2 and p T Mp ≥  2 . As a consequence, we have that Second, when the vector p that determines  is aligned with the eigenvector v 1 of M, it is possible to express  in terms of the sensitivity measures  and . When p = ∥ p ∥ v 1 = v 1 , recalling definitions in Eqs. 11 and 13, one obtains v 1 In addition, we note that if v 1 and the eigenvector of H corresponding to its largest eigenvalue are aligned, one obtains that  =  MAX /, which is the maximum value for the sensitivity measure . We now demonstrate that the sensitivity of the hidden variables to the measured ones,  2 , decreases as we measure one further variable. Let us assume now that we are able to measure one further variable, thus increasing the size of the set of measured variables to n m ′ = n m + 1 and, correspondingly, reducing that of the unmeasured variables to n h′ = n h − 1. Given the property in Eq. 10, the new sensitivity matrices can be written as M ′ = M + M 1 and H ′ = H − M 1 , where by M 1 , we denote the sensitivity matrix for the newly measured variable. The new generalized eigenvalue problem is H′u′= ′M′u′ ⇔ (H − M 1 ) u′= ′(M + M 1 ) u′ (19) where, for simplicity, we have denoted by  ′ the largest generalized eigenvalue of matrices M ′ and H ′ . Left multiplying by u ′ T and dividing by u ′ T Mu ′ , we obtain where the first inequality comes from the definition of  n p , while the second comes from the fact that H, M, H ′ , M ′ , and M 1 are positive semidefinite. In short, we find that  n p ≥ ′, meaning that, by measuring one variable, the sensitivity of the hidden variables to the measured ones decreases. The SIAR model of Fig. 1 is described by the following equations where s(t), (t), a(t), and r(t) represent population densities, i.e., , and R(t) represent the number of susceptible, infectious, asymptomatic, and recovered individuals, and N is the size of the population, so that s(t) + (t) + a(t) + r(t) = 1. Here,  I and  A are the transmission rates for the symptomatic and the asymptomatic individuals, respectively,  is the probability for newly infected individuals to show no symptoms,  AI is the rate at which asymptomatic individuals become symptomatic, and  IR and  AR are the recovery rates for the two infectious populations. Note that all these parameters are positive quantities. Asymptomatic individuals are difficult to trace as the individuals themselves could be unaware about their state. As a consequence, we assume that the density of asymptomatic individuals is not measurable, while the densities of symptomatic and recovered individuals are measured variables. According to the notation introduced in Eq. 1, we therefore have that m ≡ [, r] and h ≡ [s, a]. Note that, as a first approximation, here, we assume to be able to trace the asymptomatic individuals once they recover. The results presented in Fig. 2 have been obtained considering the following setup. As the number of symptomatic infectious and recovered individuals are considered measurable, we have assumed that the initial conditions (0), r(0), and the rate of recovery  IR are known parameters. Second, we have supposed to be able to measure, for instance, through backward contact tracing the rate at which asymptomatic individuals develop symptoms, i.e.,  AI . Hence, the vector of parameters to determine by calibrating the model is given by p = [a(0),  I ,  A , ,  AR ]. Table 2 displays the value of the model parameters used to obtain the results shown in Fig. 2 . For the analysis of the four scenarios considered in Fig. 3 , the values of the model parameters have been set as given in Table 3 . The SIAR model parameters have been estimated from data by adopting two different approaches, i.e., a nonlinear least square error minimization and a Bayesian inference. To generate a synthetic dataset, we have integrated the deterministic model in Eq. 21. To mimic measurement errors, we have adopted the following procedure. First, we compute r ̄ (t) by adding to r(t) a uniform noise in the interval (−  t /2,  t /2), where  t = |r(t + 1) − r(t)|, checking that the synthetic time series remains monotonically nondecreasing. We have then generated the data s ̄ (t) in a similar fashion, this time controlling that the synthetic time series remains monotonically nonincreasing. To generate the data  (t) , we have added a Gaussian noise with zero mean and standard deviation (SD) equals to 3% to the time series, making sure that s ̄ (t ) +  (t ) + r ̄ (t ) ≤ 1 . Last, the data a ̄ (t) have been evaluated using the fact that s ̄ (t ) +  (t ) + a ̄ (t ) + r ̄ (t ) = 1 . The integration of Eq. 21 has been carried out by using the lsoda ordinary differential equation (ODE) solver (53, 54) and then resampling the data with a sampling period of one time unit. We assumed that the density of asymptomatic individuals a is not measurable, while the densities of symptomatic and recovered individuals, i.e.,  and r, are measured variables. As regard to the least square error minimization approach, the model parameters have been estimated using a nonlinear optimization procedure (implemented via the function fmincon in MATLAB) with the following objective function to minimize where  (k) and r ̄ (k) with k = 1, …,  (with  = 50) represent the noisy synthetic time series of the densities of infectious and recovered individuals, respectively, while (k) and r(k) are the values of the corresponding variables obtained from the integration of Eq. 21. The core idea of Bayesian inference is to provide an a posteriori probability distribution for the model parameter vector, p, given an a priori probability distribution on the value of p and a likelihood function, which quantifies the goodness of a model in reproducing empirical data D. The relationship between these is given by the Bayes' theorem, which reads Table 3 . Values of the model parameters used for the case study in Fig. 3 . where (p) indicates the prior distribution, ℒ(D|p) the likelihood, and (p|D) the posterior distribution. Usually, it is not possible to evaluate analytically the integral appearing in the denominator, especially when a large number of parameters are considered. Therefore, one relies on MCMC algorithms, which allow one to approximate of the posterior distribution. The MCMC algorithm we used to implement the Bayesian inference is the DRAM (44) . As the likelihood function ℒ(D|p), we have considered the root mean square error, evaluated on the measurable variables only, i.e. (t) and r(t), which corresponds to Eq. 22, namely, to the objective function of the nonlinear optimization procedure. As we have assumed to have no a priori knowledge of the values of the model parameters, for the Bayesian inference, we have considered uniform prior probability distributions, which are the simplest and least informative choice (55, 56) . Flat priors do not require any additional information apart from setting the interval of possible parameter values. These intervals have been defined taking into account the following considerations. On the one hand, we have the initial conditions of the dynamical variables. As in the SIAR model, these represent population densities, we can assume the uniform prior distribution for their initial conditions to be defined in the interval [0,1]. Similarly, the parameter , which indicates the fraction of newly infected individuals not developing symptoms, can be assumed to be defined in the same interval. As regard the remaining parameters, since we have assumed to not have any other information except for the fact that they are positive quantities, we can consider the uniform distribution to be extended in the interval [0, ∞ ]. The nine-compartment model of Fig. 5 can be considered as a variant of the SIDARTHE model (16) . It is characterized by the presence of an incubation state, in which the individuals have been exposed to the virus (E) but are not yet infectious, and by infectious individuals, that, in addition to being symptomatic or asymptomatic, can be either detected or undetected. The model, therefore, includes four classes of infectious individuals: undetected asymptomatic (I A ), undetected symptomatic and pauci-symptomatic (I S ), home isolated (H, corresponding to detected asymptomatic and pauci-symptomatic), and treated in hospital (T, corresponding to detected symptomatic). Last, removed individuals can be undetected (R u ), detected (R d ), or deceased (D). The model dynamics is described by the following equations All the parameters appearing in (24) are considered unknown; thus, they need to be determined through fitting the model to the available data. It should also be noted that, as many nonpharmaceutical interventions have been issued/lifted, and the testing strategy has been changed several times over the course of the epidemics (47, 48) , not all parameters can be considered constant in the whole period used for the fitting. Hence, similarly to (16), we have divided the whole period of investigation (which in our case ranges from 24 February to 06 July 2020) into different windows, within each of which the parameters are assumed to be constant. In each time window, one allows only some parameters to vary according to what is reasonable to assume will be influenced by the government intervention during that time window. We distinguish two kinds of events that may require an adaptation of the model parameters. On the one hand, there are the nonpharmaceutical containment policies aimed at reducing the disease transmission. When these interventions are issued, the value of the parameters  may vary. On the other hand, the testing strategy, which affects the probability of detecting infected individuals, was also not uniform in the investigated period. When the testing policy changes, the value of the parameters  I S H ,  HT , and  HR d may vary. Here, we notice two important points. First, the value of  I S T is assumed to be constant in the whole period, as we suppose that there are no changes in how the symptomatic individuals requiring hospitalization are detected. Second, as a change in the sole parameter  I S H would affect too much the average time an individual remains infected, then  HT and  HR d also have to be included in the set of parameters that may change. On the basis of these considerations, the intervals in which each parameter remains constant or may change are identified. This defines the specific piece-wise waveform assumed for each of the parameters appearing in the model and, consequently, the effective number of values that need to be estimated for each parameter. Hereafter, we summarize the events defining the different windows in which the whole period of investigation is partitioned: 1) On 02 March, a policy limiting screening only to symptomatic individuals is introduced. 2) On 12 March, a partial lockdown is issued. 3) On 18 March, a stricter lockdown, which further limits nonessential activities, is imposed. 4) On 29 March, a wider testing campaign is launched. Starting from this date, as the number of tests has constantly increased while the number of new infections was decreasing, the parameters are allowed to change every 14 or 28 days, namely, on 11 April, 25 April, and 23 May. 5) On 04 May, a partial lockdown lift is proclaimed. 6) On 18 May, further restrictions are relaxed. 7) On 03 June, interregional mobility is allowed. This is the last time the model parameters are changed. How will country-based mitigation measures influence the course of the covid-19 epidemic? Coronavirus disease (covid-19): Weekly epidemiological update An interactive web-based dashboard to track covid-19 in real time Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): A review Review of the 2019 novel coronavirus (SARS-CoV-2) based on current evidence COVID-19 and SARS-CoV-2. Modeling the present, looking at the future The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak First-wave covid-19 transmissibility and severity in China outside hubei after control measures, and second-wave scenario planning: A modelling impact assessment Data analysis on coronavirus spreading by macroscopic growth laws Containment effort reduction and regrowth patterns of the COVID-19 spreading Analysis and forecast of covid-19 spreading in china, italy and france A mathematical model for the spatiotemporal epidemic spreading of COVID19. medRxiv Early dynamics of transmission and control of COVID-19: A mathematical modelling study Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19 A network model of italy shows that intermittent regional strategies can alleviate the COVID-19 epidemic Dynamical SPQEIR model assesses the effectiveness of non-pharmaceutical interventions against COVID-19 epidemic outbreaks Model calibration and uncertainty analysis in signaling networks Structural identifiability of dynamic systems biology models Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts Systematic identifiability testing for unambiguous mechanistic modeling-application to JAK-STAT, MAP kinase, and NF- B signaling pathway models Why is it difficult to accurately predict the COVID-19 epidemic? Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease Structural and practical identifiability analysis of outbreak models Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood Structural and practical identifiability analysis of zika epidemiological models Assessing parameter identifiability in compartmental dynamic models using a computational approach: Application to infectious disease transmission models Matrix Analysis Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the diamond princess cruise ship Epidemiology and transmission of covid-19 in shenzhen china: Analysis of 391 cases and 1 Imperial College London COVID-Response Team A new SAIR model on complex networks for analysing the 2019 novel coronavirus (COVID-19) Implications of asymptomatic carriers for infectious disease transmission and control SEIAR model with asymptomatic cohort and consequences to efficiency of quarantine government measures in COVID-19 epidemic Investigating the impact of asymptomatic carriers on COVID-19 transmission A model for the emergence of drug resistance in the presence of asymptomatic infections Seasonal transmission potential and activity peaks of the new influenza A(H1N1): A Monte Carlo likelihood analysis based on human mobility Modeling the spatial spread of infectious diseases: The global epidemic and mobility computational model Temporal dynamics in viral shedding and transmissibility of COVID-19 Chapter 11: Mathematical and Statistical Estimation Approaches in Epidemiology, in Mathematical and Statistical Estimation Approaches in Epidemiology Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems DRAM: Efficient adaptive MCMC Modelling COVID-19 Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic Presidenza del Consiglio dei Ministri (Presidency of the Council of Ministers) Presidenza del Consiglio dei Ministri (Presidency of the Council of Ministers), Legislation issued in response to covid-19 epidemic Civil protection department), Data on the national trend Estimates of the severity of coronavirus disease 2019: A model-based analysis On the observability of nonlinear systems: I Observability functions for linear and nonlinear systems Odepack, a systematized collection of ODE solvers Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations Modelling COVID-19 dynamics and potential for herd immunity by vaccination in Austria, Luxembourg and Sweden Bayesian Inference in Statistical Analysis We would like to thank V. Simoncini for pointing out the relation between the sensitivity  and the generalized eigenvalue and D. Proverbio Note that, for the time period until 5 April, we have followed the same time partition used in (16) .The model parameters have been estimated using a nonlinear optimization procedure (implemented via the function fmincon in MATLAB) with the following objective function to minimize e =, and D ̄ (k) with k = 1, …,  ( = 134 days) represent the time series of daily data for isolated, hospitalized, detected recovered, and deceased individuals provided by the Civil Protection Department (49) , and H(k), T(k), R d (k), and D(k) are the values of the corresponding variables obtained from the integration of Eq. 24. The integration of Eq. 24 has been carried out by using a suitable ODE solver with maximum integration step size equal to 10 −2 days and then resampling the data with a sampling period of 1 day.