key: cord-0643280-0yx291yl authors: Pluchino, A.; Biondo, A. E.; Giuffrida, N.; Inturri, G.; Latora, V.; Moli, R. Le; Rapisarda, A.; Russo, G.; Zappala', C. title: A Novel Methodology for Epidemic Risk Assessment: the case of COVID-19 outbreak in Italy date: 2020-04-06 journal: nan DOI: nan sha: 453e9b36ec4e9425093d5795c9a2d9202f216fa6 doc_id: 643280 cord_uid: 0yx291yl We propose a novel data-driven framework for assessing the a-priori epidemic risk of a geographical area and for identifying high-risk areas within a country. Our risk index is evaluated as a function of three different components: the hazard of the disease, the exposure of the area and the vulnerability of its inhabitants. As an application, we discuss the case of COVID-19 outbreak in Italy. We characterize each of the twenty Italian regions by using available historical data on air pollution, human mobility, winter temperature, housing concentration, health care density, population size and age. We find that the epidemic risk is higher in some of the Northern regions with respect to Central and Southern Italy. The corresponding risk index shows correlations with the available official data on the number of infected individuals, patients in intensive care and deceased patients, and can help explaining why regions such as Lombardia, Emilia-Romagna, Piemonte and Veneto have suffered much more than the rest of the country. Although the COVID-19 outbreak started in both North (Lombardia and Veneto) and Central Italy (Lazio) almost at the same time, when the first cases were officially certified at the beginning of 2020, the disease has spread faster and with heavier consequences in regions with higher epidemic risk. Our framework can be extended and tested on other epidemic data, such as those on seasonal flu, and applied to other countries. We also present a policy model connected with our methodology, which helps policy-makers to take informed decisions. The prediction of the future developments of a natural phenomenon is one of the main goals of science, but it remains always a great challenge especially when the phenomenon that one is observing involves people that can have a feedback reaction on the observed quantities. This is particularly true in the case of epidemics, especially with the COVID-19 outbreak that the world is suffering in this period. SARS-CoV-2 is a novel coronavirus initially announced as the causative agent of pneumonia of unknown etiology in Wuhan city, China. The genome sequence is related to a viral species named severe acute respiratory syndrome (SARS) related-CoV. These viral species also comprise some viruses detected in rhinolophid bat in Europe and Asia (Zhang 2019 , Peiris 2003 . A specific protein (S protein) of virus surface facilitates viral entry in the target cells by angiotensin converting enzyme 2 (ACE2) (Hoffman et al. 2020) ; the membrane glycoprotein Dipeptidyl peptidase-4 (DPP-4) is also considered a potential key factor in the determinism of coronavirus infections (Doremalen 2014) . Several cytokines generated during the virus infection causes a hyper-activation of Janus Protein Kinase (JAK) / Signal Transducers and Activators of Transcription (STAT) protein family, that is involved in many specialized cell processes as the development, proliferation and homeostasis (Stebbing 2020) . The mechanisms of immunological response to the virus infection are incompletely known, however a dysregulation of the immune system is very likely responsible for a worse outcome especially in patients with pre-existing respiratory or systemic diseases (Conticini, 2020) . Most infections by coronavirus are mild and selftreated, so that it can be misleading, especially in the early stages of the disease evolution, to estimate the real spread of the virus just on reports of hospital and general practitioner reports. Moreover, these reports vary according to how the numbers of cases are measured, the number of tests being related only to the number of symptomatic patients. However, the large amount of official data made public in recent weeks, and updated day by day (GitHub 2020), has stimulated the development of mathematical models, which are fundamental to understand the possible evolution of an epidemic and to plan effective control strategies (see for example Giordano et al. 2020 , Castro et al 2020 , Castorina et al. 2020 , Lanteri et al. 2020 , Fanelli et al. 2020 , Zlatic et al 2020 , Magdon-Ismail 2020 . Due to the incompleteness of the data and to the intrinsic complexity of the globalized world where we live, predicting the evolution of the pandemic, the epidemic peak and the end of it, is a difficult challenge (Castro et al 2020) . In this regard, it is worth quoting a famous sentence attributed to Niels Bohr, "Prediction is very difficult, especially about the future". In this paper we propose a different approach that aims instead at evaluating the apriori risk of an epidemic, and can then also help strategic planning to prevent or decrease the impact of future epidemics. The COVID-19 outbreak started officially in China in January 2020, although the virus was probably already circulating in the country since late October 2019 according to a recent report (Giovanetti et al. 2020 ). In Italy the first infected Italian patient was officially detected on the night of February 20 in Codogno (Lombardia), even if a recent research study of Lombardia Region reveals that more than 1000 positive cases were present (but not tested) in that region already in the second half of January (Reuters 2020 ). On the other hand, at the end of January, a couple of Chinese tourists coming from Wuhan were hospitalized in Rome (Lazio) after the confirmation test of the infection. So in Italy we have had at least two official starting points of the COVID-19 outbreak, one in the north of Italy and one in the central part (Giovanetti et al. 2020 ) leaving an open question on the reasons for the faster diffusion of the virus in the northern regions of Italy than in the central ones. The approach proposed in this paper can offer a possible explanation for the observed different diffusion and impact of the disease, based on a series of cofactors that can differentiate the regions of Italy in various respects. In particular, the methodology we introduce, based on the Crichton's triangle (Crichton 1999 , Kron 2002 , evaluates the epidemic risk index of the various Italian regions in terms of several factors, such such as air pollution, people mobility, winter temperature, housing concentration, health care density, population size and age, which can be quantified using available data. These factors have been combined to construct a reliable indicator of the a-priori epidemic risk index in Italy, which has then been compared to the impact of the current COVID-19 outbreak. As we will show below, Italian regions which have been mostly affected by the pandemic (in terms of patients in intensive care units (ICU) and deceased ones) are also those with the highest value of risk for a virus epidemic to start, diffuse and be harmful for the population. Furthermore, we will show that our epidemic risk index compares well also with the official available data of seasonal flu 2019-2020 in Italy (Epicentro 2020). Finally, we will propose a theoretical policy model to properly address containment and risk reduction strategies which could be useful not only for the present case, which hopefully will end within a few months, but also in the future to decrease the impact of new epidemic waves. In what follows, before introducing and discussing our new methodology, we would like to discuss the official data released by the Italian Protezione Civile up to April 2 (GitHub 2020), which is the last day when we updated our analysis. Up to April 2 the official report records in Italy we had 115252 total infected cases and 13915 deceased persons due to COVID-19. However, several recent studies have shown that the official data underestimate the correct numbers of infected people. In ref. (Pinotti et al. 2020 ) it has been shown that at least 60% of infected individuals are asymptomatic, and usually difficult to detect. In fact the testing strategies adopted in Italy have generally consisted in testing only those people who show severe symptoms and especially people over 65. For this reason the daily data officially reported depend very much on the number of tests done on the population and result in a biased sample towards aged patients. An evidence in favor of this fact is shown in Fig. 1(a) , based on ISPI report (Villa 2020) , where it is reported the geographic distribution of the apparent lethality rate (i.e. the ratio between the number of deceased patients and the total infected found) recorded in the various Italian regions. The average value is the highest in the world with a value around 10%. In China, where the outbreak started, the average value is 4%. Such a disproportion could be also due, in part, to the fact that in China the average age of the population is 38 years old against a value for Italy equal to 47 (Italy is the country with the most aged population after Japan (Worldometer 2020)). However, the main reason is probably that the real number of infected people is much greater than what official data tell us. Lethality seems to vary a lot also from one Italian region to another. The apparent lethality rate differs very much when one goes from the northern regions of Italy to the central and southern ones. In more detail, according to the official Italian COVID-19 data, as reported in ref. (Villa 2020) on march 24 , we have an apparent lethality rate around 13.6% in Lombardia and around 2.4%, i.e. almost six times less, in Sicilia. Again, such an increase of lethality in Northern Italy with respect to the Center and South of Italy, could be related to the number of tests: as shown in the ISPI report (Villa 2020) there is an interesting correlation among the apparent lethality rate and the number of tests done. If tests are performed only to severe, hospitalized and aged patients, as it has happened in most of the Italian regions with the exception of Veneto, the lethality obviously increases. A plausible realistic estimate of the Italian average lethality rate, done through the comparison with other countries gives an estimate around 1.14% (Villa 2020) . On the basis of these data, it is possible to make a more realistic estimate of the actual number of infected people in Italy, which could be around ten times the one reported by the official data (Villa 2020) . A recent study of the Imperial College (Flaxman et al. 2020) goes even further along this line, estimating that around 10% of the Italian population (i.e. around 6 million people) has been infected by COVID-19 on March 28, against the official number of around 92500 persons at the same date. Due to such uncertain estimates, in order to have a more reliable indicator of the damages attributable to SARS-CoV-2, it is convenient to look to the average total mortality registered in the various regions. In a recent report of the Italian Health Ministry (MS 2020), the excess of recorded daily deaths (due to all causes) during the last months has been compared with the average daily numbers calculated, during the same period, in previous years. The report shows an excess of deaths in the last few weeks (beyond one standard deviation) which in some cases is even 4 times higher than the official COVID-19 data. This can be explained with the fact that many old people, especially in the northern part of Italy, died in their homes or in their hospices without the possibility to be hospitalized and tested. But there is another important information that we can extract from these data. Fig. 1 Summarizing, we can conclude that, in general, the official Italian data on the number of detected positive cases, and on those of hospitalized, intensive care and deceased patients underestimate the real impact of the COVID-19 outbreak. What emerges with no doubt from the data, and in particular from those in Fig. 1(b) , is that the more dramatic impact of the outbreak in the northern regions of Italy than in the rest of the country. At a first sight this may appear strange, considering the two official outbreaks of COVID-19 in the central part (at the end of January) and in the northern part (at the end of February) of Italy (Giovanetti et al. 2020) , and in addition to this the several waves of hundreds of thousands people leaving the northern areas to come back to their home regions in central and southern areas before the lockdown of the country on March 9 (Pepe 2020, The Guardian 2020). This is confirmed by a recent survey on human mobility based on cellular phone data as shown in panels (c) and (d) of Fig. 1 where the outgoing people flow from Lombardia to other regions of central and southern Italy on two different dates, namely February 23, 2020 and March 7, 2020 is reported (Teralytics 2020). It is then plausible that the virus had enough time to spread in the whole country already before any mobility restriction. As we will show below, if we limit ourselves to a coarse grain description, we shall be able to identify groups of regions, with an a-priori level of epidemic risk that is in agreement with the heterogeneous geographical impact of the outbreak. We have investigated a series of factors, which can contribute to the risk of an epidemic diffusion and its impact on the population. Among the many possible, we selected the following variables: mobility index, housing concentration, healthcare density, air pollution, average winter temperature and age of population. In Methods, paragraph 1, we explain, in detail, the rationale under the choice of these variables, which relies on either generic epidemics literature or recently confirmed features of the COVID-19 outbreak, and show the corresponding indicators together with their data sources (see Table M1 ). We also explain, in paragraph 2 of Methods, how these indicators have been normalized between 0 and 1. The first step is, of course, to estimate to what extent the chosen normalized variables individually correlate with the main damage indicators of the COVID-19 epidemic, i.e. the total cases and the total deaths detected in each Italian region, cumulated up to April 2, 2020, and the intensive care occupancy recorded on April 2, 2020 (data from GitHub Repository COVID-19 2020). In the first two rows of Fig. 2 , from panel (a) to panel (f), the spatial distributions of the six risk indicators, multiplied by the population of each region, are reported as maps of colors and thus can be visually compared with the analogous maps of the three damage indicators, panels (g), (h) and (i) in the third row. In order to have more detailed information, in paragraph 2 of Methods we also report (see Table M2 ) the results of linear least square fit of each individual risk indicator with the damages. We found correlation coefficients ranging from 0.71 to 0.96, always higher than those observed as a function of the population, which can be considered the null model; however, the relative quadratic errors stay quite high (from 0.26 to 0.62). This suggests that some proper combination of these risk indicators could be able to capture in a better way the global risk associated with each region. In the next paragraph we propose a risk assessment framework able to realize such an objective. Conventional risk assessment theory relies on "Crichton's Risk Triangle" (Crichton 1999 , Kron 2002 , shown in panel (l) of Fig. 2 . In this framework, risk is evaluated as a function of three components: Hazard, Exposure and Vulnerability, and it is present when all of them co-exist in the same place. Hazard is the potential for an event to cause harm (e.g. earthquake, flooding, epidemics); Exposure measures the amount of assets exposed to harm (e.g. buildings, infrastructures, population); Vulnerability is the attitude of assets to be harmed when exposed to hazard events (e.g. building characteristics, drainage systems, age of population). Used for the first time in the insurance industry (Crichton 1999) , this approach has been extended to assess spatially distributed risks in many fields of disaster management, such as climate change impact (Tomlinson et al. 2011 , IPCC 2014 , Thomalla et al. 2006 , Kim et al. 2015 , Collins et al. 2009 , Estoque et al. 2020 ) and earthquakes (Babayev 2010) . For the purpose of this paper, we could consider Hazard as the degree of spread of the virus among the population of an Italian region, and that it is affected by a set of factors, related to spatial and socioeconomic characteristics of the region. Exposure is the amount of people who might potentially be infected by the virus as a consequence of the Hazard; thus, it should coincide with the size of the population of that region. Vulnerability, the third component of the risk, is the attitude of an infected person to become sick or die. In general, it is strongly related to the age and initial health conditions before the infection. Vulnerability can be combined with Exposure, to obtain a measure of the absolute damage (i.e. the number of people in a given region who become ill for pathologies related to the virus), which will be called Consequences. Notice that the approximately uniform spreading of the virus to the whole Italian territory, from its initial appearing and before the mobility restrictions, is an important assumption of the present study, and -as explained in the introduction -is supported by many evidences (see Giovanetti et al. 2020 , Pepe 2020 The Guardian 2020, Teralytics 2020). We do not consider here the transitory phase during which this spreading took place. In paragraph 3 of Methods we propose two models that differ in the way the risk indicators, introduced in the previous paragraph, are aggregated into the three components of the Crichton's risk triangle. In particular, we consider the E_HV model, where the effect of Hazard and Vulnerability are combined in a single affine function of the six indicators, and the E_H_V model, where Hazard and Vulnerability are considered as affine functions of, respectively, mobility index, housing concentration and healthcare density, on one hand, and air pollution, average winter temperature and age of population on the other hand (see Fig. 2 (m) for a summary). In both models the Exposure is represented by the population of each region. Furthermore, two versions of each model have been considered: an optimized one, where the weights of the risk indicators are obtained through a least-square fitting versus real COVID-19 data, and an a-priori one, where all the weights are assumed to be equal. Tables M3 and M4 in Methods, the models based on fitting the parameters give a better agreement, both in terms of relative mean quadratic error and correlation coefficient, with the data, as expected. In particular, E_H_V model gives the best agreement. Furthermore, in agreement with the strong correlation of the variables with the targets, most coefficients are positive. Indeed, all coefficients obtained by fitting the number of cases and the intensive care occupancy are positive, and only one negative coefficient appears in each model, when fitting the number of deceased. However, the numerical value of the coefficients strongly depend on both models and targets, making these models not very robust. On the other hand, the a-priori models are independent of the targets, depending only on the choice of the variables we decided to include in the risk evaluation. Among the two considered a-priori models, where all coefficients assume the same value, we observe that the E_H_V model produces a lower error with respect to real COVID-19 data and better correlation coefficients than the E_HV model, thus justifying the multiplicative approach which define the risk intensity in terms of Hazard times Vulnerability. Moreover, the peculiar aggregation of the risk indicators in the three components of the E_H_V model seems also in a better agreement with the reasons that led us to choose those indicators (as explained in Methods, paragraph 1). Once confirmed the advantage of adopting the a-priori E_H_V model, let us now build the corresponding regional risk ranking and validate the model with the regional COVID-19 data as a case study. In particular, following the scheme of Fig. 2 (m) , by multiplying Exposure and Vulnerability for the k-th region, we first recover the Consequences ( ! = ! • ! , k=1,…,20) then, by multiplying Hazard and Consequences, we will obtain the global risk index ! for each region ( ! = ! • ! , k=1,…, 20). In this respect, the risk index can be interpreted as the product of what is related to the causes of the virus diffusion in a given region ( ! ) and what is related to the effects on people ( ! ). In Fig. 3a we can appreciate the predictive power of our model by looking at the a-priori risk ranking of the Italian regions, compared with the COVID-19 data (GitHub 2020) about total cases (cumulated), deaths (cumulated) and intensive care occupancy (not cumulated), updated at April 2, 2020. The values of ! have been normalized to their maximum value, so that Lombardia results to have ! =1. The average of ! over all the regions results to be "# = 0.15 and can be considered approximately a reference level for the Italian country (even if, of course, it has only a relative value). As already explained, due to the intrinsic limitations of the official COVID-19 data, it is convenient to make the comparison at the aggregate level of groups of regions. Let us therefore arrange the 20 regions, listed in order of decreasing risk, in 4 color groups, depending on their level of damage in terms of total cases registered (less than 1000, between 1000 and 8000, between 8000 and 16000, more than 16000). With this choice, our model is clearly able to correctly identify the four northern regions where the epidemic effects are today far more evident: the first in the ranking, i.e. Lombardia (with a risk score about three times more than the second classified) and the group of the three regions immediately after it, Veneto, Piemonte and Emilia Romagna (even if not in the exact order of damage). A quite good agreement can be observed also for the other two groups: only for Sardegna the effects on both total cases and deaths seem to have been slightly overestimated (its insularity might play a role), while for other two regions, Umbria and Valle d'Aosta, one of the two damage indicators has been slightly underestimated. The clear separation between northern regions from those of Center and South is also confirmed by Fig. 3b , where the a-priori risk color map (middle panel) is compared with the map of COVID-19 total cases (left panel) and the map of the serious cases and deaths of the seasonal flu 2019/20 in Italy (right panel, ISS data -Epicentro 2020). The agreement is already visible at a first sight. In Fig. 4 we show the correlations between the a-priori risk index and the three main damage indicators related to the outbreak, i.e. the total number of cases (a), the total number of deaths (b) and the intensive care occupancy (c), at April 2, 2020. For each plot, a linear regression has been performed, with Pearson correlation coefficients always taking values greater than 0.95, indicating a strong positive correlation. On the right of each plot, the corresponding percentages of damage observed in the three Italian macro-regions (North, Center and South, see the geographic map (d)) are reported. Also in this case, compared with the percentage of cumulated a-priori risk associated to the same macro-regions (e), the correlation is evident. Another interesting way to visualize these correlations is to represent the a-priori risk index through its two main aggregated components, Hazard and Consequences, and plotting each region as a point of Fig. 5a , where the points have been also characterized by the same color of their membership group of Fig. 3 , indicating a different level of damage observed in the corresponding region at April 2, 2020. It is evident that the iso-risk line described by the equation C = R av / H (being R av = 0.15 the average regional risk value) is able to correctly separate the four more damaged, and highly risky, northern regions (plus Lazio) from all the others. The value of the risk index is reported in parentheses next to each region name. It is interesting to notice that, as shown in detail in Fig. 5b where the ranking of the Italian regions has been disaggregated for both Hazard and Consequences, some regions (such as Friuli, Trentino or Valle d'Aosta) are characterized by high values of Hazard but quite low values of Consequences, while for other regions (such as Campania or Piemonte the opposite is true (see also the colored geographic maps (c) and (d) for a visual comparison). This confirms that it is necessary to aggregate these two main components in a single global index to have a more reliable indication of the regional a-priori risk. Let us close this section by showing, in Fig. 6 , three sequences of the geographic distribution of the total cases (a), total number of deaths (b) and actual intensive care occupancy (c) as a function of time, from March 9, 2020 to April 2, 2020. These sequences are compared with the geographic map of the a-priori risk level (the bordered image on the right in each sequence the latter being independent of time. In all the plots, damages seem to spread over the regions with a variable intensity (expressed by the color scale) quite correctly predicted by our a-priori risk analysis. In the next section we will see that the methodology we are proposing in this paper, and in particular this kind of representation in terms of risk diagram, may be used to build a policy model aimed at mitigate damage in case of an epidemic outbreak like the COVID-19 one. We have seen that the risk can be divided in two components, one related to the causes of the infection diffusion and one concerning the consequences. In this section we interpret the consequences in terms of protection and required support to people with the goal of improving the social result and/or reducing the economic cost. It is evident that enhancing the capability of the healthcare system appears to be the most important action: basically, the insufficient carrying capacity creates the emergency. Beyond specific factors explained above, the epidemic crisis in Lombardia essentially showed a breakdown of its healthcare system, caused by high demand for hospital admissions, long permanence in intensive care, insufficient capacity of hospital assistance (diagnosis equipment, staff, spaces, etc.). Data illustrated in previous sections provide a positive analysis of an epidemic disease. The normative approach here described presents a viable framework to assess possible policy protocols. Several variables affecting the diffusion of an infection can be looked at as suitable policy instruments to manage both the spreading process and the stress level to the healthcare system of a given district (such as a country, a region, an urban area, etc.). Following the evidence suggested by data, we propose a theoretical model (whose details are presented in the section about Methods, subsection 4) based on two independent variables influencing the level of risk, namely the infection ratio, i.e., the proportion of infected individuals over the total population, and the number of per capita hospital beds, as a measure of the impact of consequences caused by the spreading of the disease. We adopt an approach based on a standard model of economic policy, in which a series of instruments explicitly affecting the infection ratio and the per capita hospital beds endowment can be used to approach the target, i.e., the minimization of the risk level. A similar rationale, covering other topics, can be found in Samuelson and Solow (1960) and builds upon a widely consolidated literature which dates back in time (Pigou, 1932 , Hotelling 1938 , Hicks 1939 , Samuelson 1947 and 1956 , Bator 1957 , among many others). Despite the analysis concerns a collective problem, the model here proposed describes elements of a possible decision process followed by an individual policy-maker, thus remaining microeconomic in nature. Panel (a) in Fig. 7 shows the risk function, while the right panel provides an illustration of the family of its convex contours, for a finite set of risk levels (limited for graphic convenience): Fig. 7 replicates the meaning of Fig. 5a by translating the consequences indicated by data as the required per capita hospital beds, while explaining that the position of each iso-risk curve corresponds to the different actual composition of the scenario at hand. We assume a unique care strategy based on the structural carrying capacity of the healthcare system, defined as the available number of per capita hospital beds. Such a carrying capacity derives from the health expenditure % , which is set to a level considered sufficient. Such a choice is based on political decisions and is reasonably inferred from past experience, structural elements of population, such as age and territorial density, etc. A part of the deliberated budget is dedicated to set up intensive care beds, as an advanced assistance service provision. During an emergency, possibly deriving from an epidemic spreading, the number of beds can suddenly reveal insufficient. In other words, it is possible that the amount of hospital beds required at a certain point is greater than the current availability. In the model, we assume the number of hospital beds, H, and the proportion of intensive care beds, , as exogenously determined by the policy-maker who fixes % . The actual carrying capacity is shown as a function of the infection ratio, x, computed as the infected population over the total, as shown in panel (c) of Fig. 7 , and detailed in subsection 4) of Methods. Changes in the proportion of per capita intensive care hospital beds over the total, cause instead, a variation in the slope of the line (which becomes steeper for reduction in the proportion of intensive care beds). Finally, changes in the overall expenditure shift the line with the same slope (above for increments of the expenditure). In particular, it is worth to notice that the political choice of the ratio = / may imply that the overall capacity to assist the entire population is not guaranteed (i.e. the intercept on the axis might be less than 1). A direct comparison of elements contained in panels (a-b) and (c-d) of Fig. 7 provides a quick inspection of the policy problem, focused to control the epidemic spreading. The constraint should be considered as a dynamic law, but since the speed of adjustment is reasonably low, we will proceed by means of a comparative statics perspective, in which a comparison of different strategies can be presented, by starting from different, static, scenarios. Further, by definition, an emergency challenges the usual policy settings, since the speed of damages is greater than that of policy tools. In panel (e) of Fig. 7 a hypothetic country has a given carrying capacity to sustain the risk level represented by the iso-risk curve. Without an immediate availability of funds to increase the carrying capacity, the main policy target could easily be described as the transposition of the iso-risk curve to the bottom-left: the closer the curve to the origin, the higher the satisfaction for the community. Secondly, the meaning of the relationship between the curve and the line is that until the curve touches the line, the policy maker has a sort of measure of how much the problem is out of control, given by the distance between the curve and the constraint. Third, policies may try to transpose the curve to lower levels or, equivalently, the constraint upwards (with or without modification of the slope). A minimal result is reached if both are at least tangent, as depicted in panel (f) of Fig. 7 . Whenever such a tangency condition has been reached, the highest infection rate that the given health care system can sustain has been found. Further policy actions are possible to approach a lower iso-risk curve or to save resources and/or re-allocate them differently. A policy can be considered satisfactory when any of points belonging to the arc TT' is reached, e.g. the point L. Alternative policies are neither equivalent, nor requiring the same actions, and the policy-maker has to choose actions with reference to the actual data collected by its own Country. Points F and G, although carrying the same risk level as E, still represent out-of-control positions. Different regions of the plot have a different signaling power: at point F, the infection rate is low and, thus, very difficult to be further reduced. In such a case, for example, it would be advisable to suggest health protocols which improve people safety. On the contrary, at point G, the infection rate is so high that strong direction to create a limit on social interaction easily appears to be much more urgent than rules of conduct or medical protocols. The right mix between a demand-side and a supply-side policy to adopt is a decision of political nature. A distinction can be made by saying that demand-side policies are devoted to reduce the number of newly infected people (by means of restrictions to movements, quarantine regulations, rules of conduct, etc.) and their effects are able to lower the iso-risk curves; supply-side policies are, instead, aimed at incrementing the carrying capacity of the system (by means of expenditure for the healthcare system, increments of dedicated personnel and intensive care beds, in-house medical protocols) and their effects can shift the constraint representing the carrying capacity of the system. Politics has, then, to decide when the risk is low enough or the constraint is sufficiently high. Specific calibration of the model will allow, in a forthcoming research, a detailed analysis of policy implications, by considering actual conditions and risk factors of specific districts, thus providing the policy-maker with a toolbox for normative directions. For instance, the model can be read to analyze differences in proposed actions in Lombardia and Veneto, and in other regions or countries. We have shown how a data-driven epidemic risk analysis, that accounts for a proper combination of a set of cofactors, can contribute to understanding the highly inhomogeneous spread of COVID-19 in Italy in terms of a different a-priori risk exposure of different geographical areas. Regions such as Lombardia, Veneto, Piemonte and Emilia Romagna result indeed in the first positions of our proposed a-priori risk rankings, which consists of three main components, Hazard, Exposure and Vulnerability, related, directly or indirectly, to the probability of spreading of a virus and of its harming ability. We have evaluated these three components by using historical available data on various factors that can contribute. In the second part of the paper we then advanced a theoretical policy model that provides an intuitive instrumental toolbox on complex problems, as those deriving from an epidemic emergency. While preserving simplicity, the model is able to depict various scenarios according to actual data and can help designing policy strategies fitting the situation at hand. In particular, by means of a reliable sampling procedure, collected data can be used as an input to the model, thus providing policy-makers with the possibility to take informed decisions and actions. As an example, different relaxation strategies of the lockdown could be adopted in areas with different a-priori risk. In conclusion, our work is a first attempt to consider together different factors that can contribute to evaluate the epidemic risk in a geographical area. Better medical knowledge and available data will be important to further refine and improve the proposed methodology, which could also be easily applied to other countries. The Pearson correlation coefficients are very good, always greater than 0.95. The corresponding percentages of damages, aggregated for the three Italian macro-regions (North, Center and South (d)) are also reported to the right and can be compared with the percentages of cumulated a-priori risk (e). It is clear that our a-priori risk index is able to explain the anomalous damage discrepancies between these different parts of Italy. b a c e d Figure 5 : (a) Risk Diagram. Each region is represented as a point in the plane { × } while the color is proportional to the level of damage in that region updated at April 2, 2020. The most damaged regions lie with a good approximation above the C = Rav /H hyperbole (i.e. the iso-risk line related to the average regional risk index), while the less damaged ones lie below this line. The a-priori risk index score is also reported for each region. This paragraph presents all the indicators we used, the rationale and references that justify their selection. Table M1 reports the data used for each region, their definition, unit of measure and relevant source. Mobility index: Commuting data are often used to correlate population mobility and the spreading of an infection (Charaudeau et al. 2014 ). On the other hand, many just published papers have monitored to what extent people are complying to issued travel restrictions (Klein et al. 2020 ) and if they are proofing to be effective in the reduction of the Covid-19 epidemic spreading (Fang et al. 2020 ). According to available data, the average trip rate of mobile population in Italy is 2.50 per day and the average distance covered is 28.5 km per day (ISFORT 2019). We characterize each region with a "mobility index" as the regional average of the ratio between the sum of commuting flows (incoming and outgoing) for each municipality and its employed population. The data source is the Italian Ministry of Economic Policy Planning and Coordination (www.urbanindex.it). Housing concentration: Urbanization increasingly affects the epidemiological characteristics of infectious disease (Alirool et al. 2011 , Stier et al. 2020 . Close proximity of people in their short range mobility and the attitude to use crowded public transport is amplified in compact and dense cities (Mo et al. 2020) . We capture those circumstances by the "Housing Concentration", measured as the ratio between the number of houses classified as "non detached houses" and the total number of houses. The data source is the same database cited for the mobility index. Healthcare density: Delayed hospital admission, misdiagnosis, unsuitable air conditioning systems, lack of infected patients segregation and inter-hospitals transfers, all might contribute in what it is commonly called super-spreading event (Stein 2011) . We measure the potential occurrence of this events in the infection spreading by including the "Healthcare Density" as the number of hospital beds per 10.000 inhabitants at regional level. Data source is the website of the Ministry of Health (http://www.dati.salute.gov.it/dati/dettaglioDataset.jsp?menu=dati&idPag=17). Pollution: Long-term exposure to air pollution may be one of the most important contributors to fatality caused by the COVID-19 in Europe (Ogen 2020) and in northern Italy Conticini et al. 2020) . Particulate matter (PM) is able to penetrate deeply into the respiratory tract and increase the risk of respiratory diseases (Cui et al. 2003) . According to the European Environment Agency (EEA Report No 10/2019), PM concentration in 2016 were responsible for about 374.000 premature deaths in the EU-28. Recently has been evidenced that PM10 determines a hyperactivation of JAK/STAT protein family that is associated with cells proliferation and survival (Reyes-Zaràte et al. 2016) . JAK/STAT is also hyperactivated by several cytokines generated during the Covid-19 infection (Stebbing et al. 2020) . For these reasons a strengthening mechanism between PM10 and Covid-19 infection could be assumed. Besides, very recent studies are directly correlating the population exposed to particulate pollution and the contagion from COVID-19 and the consequent health damage (Setti et al. 2020 , Pansini and Fornacca 2020 , Wu et al. 2020 . Based on these premises we decided to include the PM10 annual average of the mean daily concentration as a factor influencing the vulnerability of people exposed to the infection. The data source is WHO (https://www.who.int/airpollution/data/cities/en/), that provides measures at urban background, residential areas, commercial and mixed areas for the period 2013-2016. Temperature: Weather plays a role in the spread of 2019-nCoV (Bukhari and Jameel 2020, Pirouz et al. 2020) , although not fully established. Chan et al. (2011) report that a low temperature and low humidity environment may facilitate the virus transmission in subtropical areas during the spring and in air-conditioned environments. It is also commonly accepted that cold cuts down the defense barriers of the respiratory tracts (Chowel et al. 2012 , Makinen et al. 2009 ). We decided to include the average winter (from December 2016 to April 2017) daily mean temperature in each region as a factor potentially enhancing the individual vulnerability. The source of data is the Italian Ministry of Agriculture (https://www.politicheagricole.it/). Most of the official data sources report more severe impacts of 2019-nCoV on elderly people, probably both for an intrinsic weakness of their immunity system and for the co-existence of other chronic pathologies. We use the ratio between the population over 60 and the total population to take into account this vulnerability factor. Data source is the same as population, i.e. ISTAT database (www.istat.it/it/archivio/104317). Population: Of course, as anyone who is infected might get ill, it is straightforward to use the total population of each region that could be affected by the infection as a measure of the risk exposure. We also use the population as a multiplying factor of each risk indicator when measuring its degree of correlation to the damage of each region (see first paragraph in the section Results). About 43% of the population is concentrated in the five regions of Northern Italy and one out of six in Lombardia. Data on regional population are available on ISTAT database (www.istat.it/it/archivio/104317). The seven risk indicators under consideration are named below, together with their reference interval: These variables are suitably normalized between 0 and 1 as: where min( $ ) and max( $ ) are, respectively, the minimum and the maximum value assumed by each variable $ in its own reference interval. The new normalized variables are also dimensionless. Notice that the normalization is different for the population, since we want to avoid values equal to zero, and for the temperature, since, at variance with all other quantities, we expect that the risk increases with the decrease of the temperature. The first thing is to check if any of the six risk indicators, & , … , ' , each considered separately, can fit any of the targets, ( , = 1, 2, 3 i.e. our three damage indicators: cumulative number of cases, cumulative number of deceased and number of hospitalized in intensive care at April 2, 2020. For each variable $ , = 1, … ,6, we consider a risk $,( = * * $,( * $ , and each $,( is determined by matching the target ( in the least square sense. In particular, we perform a linear least square fit, minimizing the following quadratic error : for the i-th risk indicator with respect to the l-th damage indicator. In this expression = 20 is the number of regions and (! denotes the damage indicator ( = 1, total cases, = 2, number of deceased, = 3, intensive care occupancy) for region . The relative mean quadratic error is defined as The result is summarized in Table M2 . It appears that each single parameter correlates with all three targets (respectively number of cases, deceased and hospitalized in intensive care) well above the obvious correlation coefficient of the population, shown in the first column, however the correlation are not strikingly high, and the mean quadratic error is not so small. The goal of this paragraph is that of choosing the best model of aggregation of the risk indicators presented before in the framework of the Crichton's Risk Triangle (see Results section). We observe that, in any model of this kind, the risk has to be necessarily proportional to the exposure, represented by the population. Therefore, we will assume that the risk R is given by the product of Exposure E times a given function F HV of the other parameters, related with Hazard and Vulnerability: In this paragraph we propose and compare the following models, in order to understand which one is most suited for a robust evaluation of the risk. Here the effect of Hazard and Vulnerability are combined in a single affine function of the parameters. We assume a dependence of the risk of the form: ._%-= * %- The coefficients %-, & , … , ' , in turn can be: a) obtained by a least-square fitting; b) assigned a-priori with %-= 0 and all the $ , = 1, … ,6 , assumed to be equal. 2) E_H_V Multiplicative Model. Here, Hazard and Vulnerability are considered as affine functions of, respectively, & , , , 0 and 1 , 2 , ' . We assume that %-is the product of Hazard and Vulnerability, i.e.: ._%_-= * %-= * * , (1) Again, % , -, * , … , ' can be: a) obtained by a least-square fitting; b) assigned a priori, by setting % = 0, -= 0, and with all the $ , = 1, … ,6, assumed to be equal. As in the previous paragraph, we shall compare these four models (1.a, 1.b, 2.a and 2.b) versus the three types of targets available, ( ( = 1, 2, 3), represented by our damage indicators: cumulative number of cases, cumulative number of deceased or number of hospitalized in intensive care, all registered at April 2, 2020. In particular, for models 1.b and 2.b we adopt a linear least square fit, while, for determining the optimal parameters of models 1.a and 2.a we perform a nonlinear least square best-fit, by trying to fit the total number of cases in each region up to April 2, 2020. Because in the E_H_V model the dependence of the risk on , and V is multiplicative, we may add two normalization conditions in order to avoid infinite solutions, for example: 1 + 2 + ' = 1. For all the models, we minimize the error: with respect to the parameters. In this expression = 20 is the number of regions, (! denotes the damage indicator ( = 1, total cases, = 2, number of deceased, = 3, intensive care occupancy) for region , ! indicates the population of region , and the function %-depends on the model considered. The relative mean quadratic error is defined as The results of the best-fit with the four models are summarized in Table M3 , while the coefficients of the fitting parameters, normalized so that the sum of their absolute value is 2, are reported in Table M4 for both E_HV (1.a) and E_H_V (2.a) models. Table M3 : Relative average quadratic error and correlation coefficients for E_HV and E_H_V models, for both the versions with data fitting and with a-priori coefficients. Table M4 : Normalized coefficients $ , = 1, … ,6 of both E_HV and E_V_H models, in the versions computed by fitting the data (1.a and 2.a). Most coefficients are positive, however their numerical value strongly depends on the model and on the fitted data. We propose a theoretical framework to discuss the policy problem. The risk of a community has been described above as depending on several components. We now adopt a simplification and consider the whole set of possible elements reconciled on two aspects, namely the proportion of infected individuals over the total population, which we call here infection ratio, , and the impact of consequences caused by the spreading of the disease, measured as the number of per capita hospital beds, , required by the emergency situation. Without loss of generality, we assume, for the sake of simplicity, that the aboveexplained negative role played by hospitals as contagion spreading factors, can be neglected here. Thus, define such a simplified notion of risk as : (0,1) × ℝ → (0,1) be a , function, determined by ∈ (0,1) and ∈ ℝ, i.e., = ( , ). Consistently with previous analysis, we assume that The level of risk is the target variable that the policy-maker tries to minimize, given the constraint constituted by the current carrying capacity, i.e., the endowment of hospital beds ( ) financed by the expenditure in the healthcare system % . In principle, it may be considered as a dynamic variable, but we will proceed with a comparative-static analysis, by presenting policy intuitions from different initial configurations. Thus, the constraint is the result of the political orientation of the Government. In particular, let the part of the global allowance dedicated to intensive care beds, , be the sole remedy to the epidemy and define = , ∈ (0,1). The current carrying capacity of the healthcare system, i.e. the available number of hospital beds, is the function ∶ (0,1) , × ℕ × ℝ → (0,1), defined as = − ℎ , where ℎ = (1 − ) and is the population of the district under consideration. In per capita terms, it can be rewritten as = % − ℎ , with = / and % = / . Within a comparative statics perspective, for any given couple ( , ) we can consider a reduced form of the carrying capacity constraint, depending on the infection diffusion rate as a negatively sloped line in the plane (infection ratio, hospital beds), where also the convex contours of the risk function can be considered. It is worth to notice that the model refers to the variable hospital beds in both demand, , and supply, , terms. All in all, the proposed framework matches required and available beds per infection ratio. Policies may try to affect and can effectively tune , by adjusting and % , in such a way that the exploitation of available resources allows the minimization of , at least, a tangency point between the capacity constraint and the risk level. Urbanisation and infectious diseases in a globalised world. The Lancet infectious diseases Scenario-based earthquake hazard and risk assessment for Baku (Azerbaijan) The Simple Analytics of Welfare Maximization Will Coronavirus Pandemic Diminish by Summer Data analysis on coronavirus spreading by macroscopic growth laws Predictability: Can the turning point and end of an expanding epidemic be precisely forecast? The effects of temperature and relative humidity on the viability of the SARS coronavirus Commuter mobility and the spread of infectious diseases: application to influenza in France The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak The influence of climatic conditions on the transmission dynamics of the 2009 A/H1N1 influenza pandemic in Chile Vulnerability to environmental hazards in the Ciudad Juárez (Mexico)-El Paso (USA) metropolis: a model for spatial risk assessment in transnational context Can atmospheric pollution be considered a co-factor in extremely high level of SARS-CoV-2 lethality in northen Italy? Enviromental pollution Natural disaster management: a presentation to commemorate the International Decade for Natural Disaster Reduction(IDNDR), 1990-2000Ingleton J: Tudor Rose Air pollution and case fatality of SARS in the People's Republic of China: an ecologic study Host Species Restriction of Middle East Respiratory Syndrome Coronavirus through Its Receptor, Dipeptidyl Peptidase 4 Heat health risk assessment in Philippine cities using remotely sensed data and social-ecological indicators Analysis and forecast of COVID-19 spreading in China Human mobility restrictions and the spread of the novel coronavirus (2019-ncov) in china (No. w26906) Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy A doubt of multiple introduction of SARS-CoV-2 in Italy: A preliminary overview GitHub Repository COVID-19 2020, coming from Italian Health Ministery The Foundations of Welfare Economics SARS-COV-2 cells entry depends on ACE2 and TMPORSS2 and is blocked by a clinically proven protease inhibitor The General Welfare in Relation to Problems of Taxation and of Railway and Utility Rates Synthesis Report, Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change 16° Rapporto sulla mobilità degli italiani Assessment of drought hazard, vulnerability, and risk: A case study for administrative districts in South Korea Assessing changes in commuting and individual mobility in major metropolitan areas in the United States during the COVID-19 outbreak Flood risk = hazard x exposure x vulnerability How macroscopic laws describe complex dynamics: asymptomatic population and Covid-19 spreading Machine Learning the Phenomenology o f COVID-19 From Early Infection Dynamics, medRxiv preprint doi Cold temperature and low humidity are associated with increased occurrence of respiratory tract infections Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study Report of the Italian Health Ministery on daily mortality surveillance Assessing nitrogen dioxide (NO2) levels as a contributing factor to the coronavirus (COVID-19) fatality rate. Science of The Total Environment Initial evidence of higher morbidity and mortality due to SARS-CoV-2 in regions with lower air quality. medRxiv The severe acute respiratory syndrome COVID-19 outbreak response: first assessment of mobility changes in Italy following lockdown The Economics of Welfare Lessons learnt from 288 COVID-19 international cases: importations over time, effect of interventions, underdetection of imported cases Relationship between Average Daily Temperature and Average Cumulative Daily Rate of Confirmed Cases of COVID-19 Atmospheric particulate matter (PM10) exposure-induced cell cycle arrest and apoptosis evasion through STAT3 activation via PKCζ and Src kinases in lung cells Foundations of Economic Analysis Social Indifference Curves Analytical Aspects of Anti-Inflation Policy Position paper: Relazione circa l'effetto dell'inquinamento da particolato atmosferico e la diffusione di virus nella popolazione COVID-19: combining antiviral and anti-inflammatory treatments Super-spreaders in infectious diseases COVID-19 attack rate increases with city size Teralytics 2020 (www.teralytics.net) -data The Guardian 2020 Reducing hazard vulnerability: towards a common approach between disaster risk reduction and climate adaptation Including the urban heat island in spatial heat health risk assessment strategies: a case study for Birmingham Worldometer 2020 Exposure to air pollution and COVID-19 mortality in the United States. medRxiv Novel 2019 coronavirus genome Bi-stability of SUDR+K model of epidemics and test kits applied to COVID-19 Acknowlegments AEB, AP and AR acknowledge financial support of the national project PRIN 2017WZFTZP Stochastic forecasting in complex systems. The authors would like to thank Christian Mulder for his valuable comments and suggestions.