key: cord-0924104-ub2rg4wc authors: Bucci, A.; Ippoliti, L.; Valentini, P.; Fontanella, S. title: Clustering spatio-temporal series of confirmed COVID-19 deaths in Europe date: 2021-10-06 journal: Spat Stat DOI: 10.1016/j.spasta.2021.100543 sha: 31d88ba10b24f286abaa8adc9395fde9ce6d5dca doc_id: 924104 cord_uid: ub2rg4wc The impact of the COVID-19 pandemic varied significantly across different countries, with important consequences in the definition of control and response strategies. In this work, to investigate the heterogeneity of this crisis, we analyse the spatial patterns of deaths attributed to COVID-19 in several European countries. To this end, we propose a Bayesian nonparametric approach, based on mixture of Gaussian processes coupled with Dirichlet process, to group the COVID-19 mortality curves. The model provides a flexible framework for the analysis of time series data, allowing the inclusion in the clustering procedure of different features of the series, such as spatial correlations, time varying parameters and measurement errors. We evaluate the proposed methodology on the death counts recorded at NUTS-2 regional level for several European countries in the period from March 2020 to February 2021. Since the officially reported outbreak in China of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), best known as coronavirus disease 2019 , the world has been facing an international public health emergency. The rapid and global spread of the disease prompted an extensive range of government interventions across the whole world in an attempt to contain the spread of further infection and prevent health system strain. These measures, such as the detection and isolation of infected individuals, contacttracing, quarantine measures, social distancing, and closure of non-essential businesses, have since become significant components of public health policies. While Hsiang et al. (2020) have shown that these interventions helped prevent or delay more than 500 million infections across several countries, data on their efficacy are conflicting and many questions modelled as a mixture of Gaussian processes (GP), with a Dirichlet process (DP) prior (Ferguson, 1973 ) over mixture components. This procedure is appealing on several grounds as, for example, the number of clusters does not need to be determined in advance, but is automatically selected during the clustering procedure within a suitable Markov chain Monte Carlo (MCMC) algorithm. Furthermore, the Gaussian mixture is defined through the observation equation of a state-space model (West and Harrison, 1997) , which, not only allows for time varying parameters, but also accommodates for temporal trends, spatial correlations and measurement noise. The remainder of the paper is organised as follows. In Section 2, we offer a simple exploratory analysis of the data. The methodological contribution is outlined in Section 3 and results of the study are presented in Section 4. Finally, Section 5 concludes the paper with a brief discussion. To investigate the dynamics of the pandemic, we consider data for a period of 52 weeks (18 March 2020 -18 February 2021) on COVID-19 confirmed deaths recorded for 145 NUTS-2 Regions in France, Germany, Italy, Spain, Switzerland and United Kingdom 1 . Data were collected from National repositories as reported in Table 1 . Source Link Weekly counts of deaths are a key indicator of overall epidemic impact and trajectory and may help improving data quality respect to daily data. Thereby, in our analysis, daily data have been aggregated with a weekly frequency to account for differences in the recording mechanism among countries and to filter out weekend effects. The raw data, stratified by country, are depicted in Figure 1 . The temporal evolution of the pandemic highlights a clear trend across countries, characterised by two or three peaks in the periods March -May 2020 and October 2020 -March 2021 and by low death counts for all regions during the summer. As expected, the time series within the same country present similar patterns; however, high variability can be observed in all the country-level plot, especially for the second and third waves. This suggests that the impact of the pandemic has been highly asymmetric within countries and that there might exist a supranational pattern across regions. This seems to be confirmed to some extent by Figure 2 which shows the time series correlations as a function of the spatial distance computed by means of the region centroids. As it can be noticed, although the correlations tend to decay with the spatial distance, high correlation values can still be observed at larger distances. This feature suggests that discriminative methods based on a pairwise spatial similarity metric are not suitable in this context. For this reason, in the following, we adopt a modelling strategy that, while explicitly accounting for the spatial correlation in the model specification, does not directly consider the spatial effect in the definition of the clustering structure. Furthermore, to enhance comparability among the regions, we examine standardized time series (i.e., for Figure 1 : Weekly time series of deaths (per million inhabitants) by COVID for each Country each time series we subtract its mean and divide by its standard deviation) maintaining the shape of their patterns. This allows comparable ranges focusing on the structural similarities of the regions rather than just on the amplitude levels of their time series. Following Section 2, we consider population level summaries of confirmed COVID-19 deaths collected over time over a fixed study area, D. The complete set of information is denoted as {Y t (s i ), s i ∈ D}, i = 1, . . . , n; t = 1, . . . , T, where n is the number of areal units, s i ∈ D, and T is the length of the time series. We also assume that at the first level of the hierarchy, the model for the (standardized) data is given by where η t (s i ) is the true (latent) process of interest and ε t (s i ) is a spatially and temporally uncorrelated Gaussian error term, assumed to have zero mean and time varying variance, i.e. ε t (s i ) ∼ N 0, σ 2 ε,t . This error term corresponds to measurement error and/or representativeness error associated with the true process η t (s i ). In particular, the fact that there may be deaths misattributed to COVID-19 and, conversely, others not attributed to COVID-19, some uncertainty is expected surrounding the true number of COVID-19 deaths. Because of this, it is desired to predict the smooth process η t (s i ) rather than the observed noisy process Y t (s i ). With the aim of recognising similarities and differences in the shape of the patterns of the weekly time series, a finite mixture is used as flexible model. The central assumption here is that the n time series arise from M n hidden classes and, within each class, all time series can be characterized by a common data generating mechanism which is defined in terms of a probability distribution for the entire time series, depending on unknown classspecific parameters φ m . Accordingly, denoting with η(s i ) = η 1 (s i ), η 2 (s i ), . . . , η T (s i ) , the (T × 1) vector of time series of the latent process η at areal unit s i , the density of the mixture is written as where we adopt a standard truncated Dirichlet process model to define the prior over the mixing probabilities, π m , based on some (large) upper bound M − see, Ishwaran and James (2002) . A class assignment index Z i taking a value in the set {1, . . . , M } is then introduced for each time series, η(s i ), to indicate which class the time series belongs to: · · · · · · · · · · · · · · · · · · · · · · · · f η(s The class assignment indices Z = {1, . . . , M } are random and independently distributed apriori, with prior class assignment distribution being the same for all sites, such that, P (Z i = m | π) = π m , m = 1, . . . , M . The weights in π = (π 1 , . . . , π M ), with 0 ≤ π m ≤ 1 and π m = 1, are assumed to be unknown model parameters estimated along with the data. An important aspect of the model is that we do not assume to know a priori which time series belongs to which group. For each time series, the group indicator variable Z i is estimated along with the group-specific parameters φ = φ 1 , . . . , φ M from the data. Finally, ψ summarizes unknown parameters in the probabilistic distribution that are not related with the group indicators. Assume that the φ m are independently drawn from an uncertain prior distribution G, where the uncertainty about G is expressed through a Dirichlet process model. Then, the mixture formulation gives the following standard hierarchical model (Escobar and West, 1995) η(s i ) | φ m ∼ N µ m , Σ m , where the means µ m and variances Σ m determine the parameters φ m = µ m , Σ m , and The Dirichlet process is thus defined by a distribution function G 0 which is the prior expectation of G, so that E G(φ) = G 0 (φ), and α, a precision parameter determining the concentration of the prior for G about G 0 . From the Pólya Urn Scheme it also follows that where δ φ k (φ m ) denotes a unit point mass distribution at φ = φ m − see, for example, Escobar and West (1995) and Manolopoulou et al. (2010) . The Dirichlet process prior allows to construct the weights for model classes following a stick-breaking representation (Sethuraman, 1994) , where the π 1 , π 2 , . . . are obtained iteratively as follows The variables V 1 , V 2 , . . . represent a sequence of independent Beta(1, α) random variables for which we have k < M and V M = 1 to ensure that π m = 1. The coefficient α tunes the number of clusters in a direct manner, with larger values implying a larger number of clusters a priori. In particular, as α tends to zero, most of the samples φ m share the same value, whereas when α tends to infinity, the φ m are almost i.i.d. samples from G 0 . Placing a prior on α thus allows us to draw inferences about the number of mixture components; a typical choice (Ishwaran and James, 2002) is to assume a Gamma prior, i.e. α ∼ Ga(ς 1 , ς 2 ), where E α = ς 1 /ς 2 . Choosing a large value for ς 2 is particularly relevant, because it encourages clustering; Escobar and West (1995) suggest α ∼ Ga(2, 4), with E α = 0.5. The prior specification for each component m, completes with an appropriate choice for G 0 (φ m ). In the following, since we aim to group areas together that exhibit similar temporal trends, we are only interested in modelling the mean with a mixture distribution. This is easily accommodated within the framework here by setting Σ 1 = Σ 2 = . . . = Σ M to equal some parameter Σ. This parametrization has been studied widely in literature and references can be found, for example, in West (1992) , West and Cao (1992) and Ishwaran and Zarepour (2000) . To specify the form of the mean we consider the following state space model where µ t = µ t,1 , . . . , µ t,M is a (M × 1) state vector, u t is a zero mean temporally uncorrelated error with (M × M ) covariance matrix Σ u = diag(σ 2 u,1 , . . . , σ 2 u,M ) and h(s i ) is an (M × 1) incidence vector where its m-th element is equal to 1 if Z i = m and zero otherwise. Hence, conditionally on the class assignment variables where µ t,m is the mth element of µ t and the error terms ν t (s i ) are uncorrelated in time but correlated in space. In particular, the (n × 1) vector ν t = ν t (s 1 ), . . . , ν t (s n ) , denotes a zero mean spatial Conditional Autoregressive (CAR) process (Cressie, 1993) . . , σ −2 νn,t , σ 2 ν i ,t are conditional variances and C t is a matrix which captures the spatial dependence at time t, with elements c ij satisfying the condition c ij σ 2 ν j ,t = c ji σ 2 ν i ,t − for a discussion on the symmetry conditions see, for example, Ippoliti et al. (2018) . A possible choice for C t requires first the specification of W, a symmetric regional neighbour-incidence matrix (with elements w i,j equal to 1 6 J o u r n a l P r e -p r o o f if regions i and j are adjacent and 0 otherwise) and then the definition of D t as D t = σ −2 ν,t D, where D is a diagonal matrix whose d i element is the i-th row sum of W, i.e. d i = w i+ . This specification implies that σ 2 ν i ,t = σ 2 ν,t /w i+ and that C t = ρ t D −1 W where ρ t is a spatial dependence parameter. For simplicity, it is assumed here that each region has at least one neighbour and that, for isolated regions or islands, the set of neighbours is determined by the regions within a distance of 150 kilometres. The use of disconnected graphs (i.e., a graph containing a singleton node/region with no neighbours or a graph split in different sub-graphs) for the CAR specification is not straightforward (Freni-Sterrantino et al., 2018) and it is not considered here. Also, we note that, although we consider the time series to be spatially correlated, using our approach it is possible to cluster into a single group multiple time series that have similar temporal patterns, but that are located far away from each other. This is in contrast with other approaches that, by imposing Markov random field (MRF) priors on the mixture parameters (Blekas et al., 2007) or on the cluster assignment variables (Jiang and Serban, 2012) , force the spatial clustering of the series. The state space model described above represents a local level model where the mean of η t (s i ) is group specific and varies over time. If µ m = µ 1,m , . . . , µ T,m is considered as a vector of parameters then, equation (6) can be interpreted as defining a hierarchical prior for µ m , m = 1, . . . , M . In particular, if is a square first difference matrix, and u m = u 1,m , . . . , u T,m , we can write for the initial conditions. In words, the stochastic trend term has a variance which is increasing with time (and thus can wander over an increasing wide range), but µ t,m changes only gradually over time being consistent with the assumption that µ t,m and µ t−1,m tend to be close to one another. It is thus worth noting that the state equation can be interpreted as already providing us with a prior for µ t,m implying that µ t+1,m | µ t,m ∼ N µ t,m , σ 2 u,m with f µ m |σ 2 u,m = T t=1 f (µ t+1,m | µ t,m , σ 2 u,m ). This is an example of a hierarchical prior, since the prior for µ m depends on σ 2 u,m which, in turn, requires its own prior. Finally, for estimation purposes, it is also worth considering that, by isolating each variable ν t (s i ) in turn, and considering the n individual conditional distributions ν t (s i )|ν t (−s i ), i = 1, . . . , n, at each site, where ν t (−s i ) denotes all other values than ν t (s i ), the general CAR specification described above for ν t implies that for spatial coefficients c i,j = ρ t w i,j /w i,+ and d i = w i,+ . A special case is the Intrinsic Conditional Autoregressive (ICAR) model which uses d i = w i,+ and ρ t = 1 (Cressie, 1993) . Incidentally we note that equation (8), by Brook's Lemma (Brook, 1964) , gives rise to a pseudo-likelihood approximation of the joint distribution of η t (s i ), which is efficient for simple Gaussian fields − see, Besag (1975) and Besag (1977) . J o u r n a l P r e -p r o o f Assume that θ is a vector collecting all model parameters. Estimation of θ is based on the posterior distribution f (θ|Y ) of θ given the data Y . The posterior distribution f (θ|Y ) is defined through Bayes' theorem as the product of the observed-data likelihood function f (Y |θ) and the prior f (θ). Under this framework, we use Markov chain Monte Carlo (MCMC) methods and posterior inference is obtained by implementing a Gibbs sampler. Computationally, we only need to calculate the full conditionals of each parameter given all other parameters, which is usually not hard. In the following, we provide details for the relevant conditional distributions. First, it is important to recall (see, for example, Sahu and Mardia, 2005) that where H is a (n × M ) matrix with ith row equal to h(s i ) . It thus follows that the full conditional distribution of η t is the multivariate Normal distribution N Σ t ξ t , Σ t where Then, using equation (8), we can thus write Conditional on η t (s i ), the state equation µ t of model (5)-(6) can be sampled by means of the well-known forward filtering-backward sampling algorithm of Carter and Kohn (1994) or Durbin and Koopman (2002) . A further important part of the estimation refers to the sampling of α whose posterior only depends on the data through the variables V k . Accordingly, α has the usual posterior distribution (Ishwaran and James, 2002) The posterior for the class assignment indices Z is multinomial with probabilities Furthermore, since the precision matrix of the CAR process can be time dependent, we specify an additional state-space model. In particular, for the conditional variance σ 2 ν,t , we introduce the following representation where the i-th element of η * * t is η * * t (s i ) = log η * t (s i ) 2 + c , η * t (s i ) denotes the i-th element of η * t = D 1/2 (I − C t ) 1/2 (η t − Hµ t ), g ν,t = 2 log(σ ν,t ), e ν,t is a (n × 1) vector of independent errors, v ν,t is N (0, σ 2 ν ) and is independent of ν t and u t . The constant c is an offset that we set at 0.001. The system in this form has a linear, but non-Gaussian state space form, because the innovations in the measurement equations are distributed as a log χ 2 (1). As described in 8 J o u r n a l P r e -p r o o f Kim et al. (1998) , in order to further transform the system in a Gaussian one, a mixture of seven Normals is used as an approximation of the log χ 2 . Let S it ∈ {1, 2, . . . , 7} denote the indicator variable of the i-th Normal from which e ν,t (s i ) is drawn from, and let S i = (S i1 , . . . , S iT ) and S = (S 1 , . . . , S n ) be component indicators for all the elements of e ν,t . Conditional on S (and the other parameters), Equations (9) and (10) define a Gaussian statespace model and, hence, the algorithm by Durbin and Koopman (2002) can be used to draw g ν,t . In particular, let q j , be the component probability of the j-th Normal of the mixture with mean, m j , and variance v 2 j , for j = 1, . . . , 7 -see, Kim et al., 1998 . Then for j = 1, . . . , 7, i = 1, . . . , n, and t = 1, . . . , T . Regarding the evolution of the spatial correlation parameter, ρ t , we specify the following state equation where ψ t ∼ N (0, σ 2 ψ ), is independent of u t , ν t and v t . Equation (12) is combined with the measurement equation defined in (8), so that the algorithm of Durbin and Koopman (2002) can be used to draw the states. Finally, posterior simulation of σ 2 ε,t follows closely that of σ 2 ) 2 + c and τ ε,t = 2 log(σ ε,t ). Then, the state space model with measurement and state equations is specified by As for g ν,t , the procedures described by Kim et al. (1998) and Durbin and Koopman (2002) are used to draw τ ε,t . Here, we complete the statistical analysis of the data introduced in Section 2 by applying the methodology discussed in Section 3 to group the 145 regions into distinct communities based on their temporal patterns. In particular, we fit several competing models by considering σ 2 ε,t , σ 2 ν,t and ρ t either as time-varying parameters or as constant through time. Overall, we evaluate 2 3 different model parametrizations. For all the fitted models, the MCMC algorithm is run for 60, 000 iterations and posterior inference is based on the last 10, 000 draws using every 5th member of the chain to avoid autocorrelation within the sampled values. Convergence of the chains of the model is monitored visually through trace plots as well as using the R-statistic of Gelman (1996) . When convergence is attained, to avoid label switching (Stephens, 2000) , we follow Nieto-Barajas and Contreras-Cristan (2014) and Dahal (2006) and choose the representative clustering structure which minimises the deviation from a pairwise clustering matrix which, taking into account all MCMC iterations, provides an estimate of the probability that the two time series belong to the same cluster. Then, in order to compare among cluster structures resulting from different model specifications, we use the following heterogeneity measure as possible. Additionally, for model comparison, we also evaluate a measure of goodness of fit. In particular, we consider the logarithm of the pseudo marginal likelihood (LPML) which requires the computation of the Conditional Predictive Ordinate (CPO) statistics (Geisser and Eddy, 1979; Mukhopadhyay and Gelfand, 1997 ) ε,T . Given posterior samples of η (l) (s i ) and Σ (1) ε , l = 1, . . . , L, the LPML is a predictive measure of model performance with larger values indicating a better model fit. By considering the model selection criteria described above, Table 2 suggests that models M 5 and M 7 do not provide the best fit of the data, but produce the most homogeneous clusters. On the other hand, a better fit can be obtained by using models M 2 and M 6 which also give the lowest number of clusters. Accordingly, a good balance between heterogeneity and goodness of fit seems to be offered by model M 2 , which is characterized by the highest LPML and a good value of the heterogeneity measure HM. For this model, the regions are grouped in M = 12 clusters, both the conditional variance and the spatial parameter of the CAR are time varying parameters, and the variance of the measurement error is constant. The dynamics of the time-varying parameters, with 95% credible intervals, are shown in Figures 3 and 4 . The pattern of the conditional variance σ 2 ν,t , coherently with the data, highlights three peaks overall the study period. There are also nearly zero values between the end of May and the end of September, denoting small individual (regional) effects in this period. There is the hint that this might be due to the implementation of national and subnational strict measures during the first phase and favourable climate conditions, though this is difficult to prove. The dynamic of ρ t shows that, on average, the spatial dependence parameter varies between 0.85 and 0.92 reaching its minimum value in the second week of May. Although the range of values appears limited, its plot suggests that the spatial correlation increases with the resumption of social and economic activities reaching two peaks of different amplitude around the second week of November and the third of January. As it can be noticed from Figures 1 and 3 A spatial map of the regions grouped by using model M 2 is shown in Figure 5 , where each cluster is represented by a specific color. Figure 6 also offers a graphic visualization of the clusters where the posterior mean group functions are overlaid with the time series of each region. As expected, the curves in the same cluster appear similar in shape, whereas present different patterns or trends across the clusters. The regions that are geographically close are likely to be in the same cluster and this applies in particular to several regions in Italy, France, and Germany. However, as expected from the exploratory data analysis (Section 2), the clusters are not always dominated by the spatial proximity of the regions. J o u r n a l P r e -p r o o f An example is provided by the second cluster, which mainly includes regions from the South of Germany and the West-Southern part of Italy. Figure 5 : Spatial map representation of the estimated clusters. Each labelled cluster is shown by its own specific colour The temporal patterns of the mean groups represented in Figure 6 show that, after peaking in early April, the pandemic appeared to be well controlled until the beginning of August. As a result of the introduction of several containment measures and with increasing temperatures, all the regions have experienced relatively low counts of deaths, with the mean curves resulting flat for most part of the summer period. Focusing on the first phase of the pandemic, the subplots also suggest that the rate of change of the early counts of deaths in a few clusters were higher than all of the other clusters. This is confirmed in Figure 7 by the dynamics of the first derivative of the group mean functions, which help to understand their rate of change as well as the persistence of the phenomenon. As it can be noticed, the regions in clusters 5 and 7 show the highest rate of change (i.e., a higher velocity) during March and April. From the exploratory analysis of the data, it also results that cluster 5 includes some of the regions with the highest mortality rate as, for example, the Lombardy region (Italy) with the provinces of Bergamo and Brescia, which have been at the centre of Italy's coronavirus outbreak started in February 2020. Cluster 7 includes a few regions from Spain and one from England. This cluster appears highly heterogeneous in the period October 2020-February 2021. Furthermore, its mean function highlights a slightly delayed onset of the pandemic compared to cluster 5, which peaked earlier than cluster 7 in the first phase. Cluster 8 is similar in shape to cluster 7; however, the latter appears more heterogeneous, with a few time series showing different timing effects, especially in January and February 2021. In contrast with these patterns, cluster 1, mainly composed by regions from the Central and North-Eastern part of Germany and most of the Swiss Confederation, shows the smallest rate of change and slow dynamics of the phenomenon in the first phase. All other clusters do not seem to differ substantially in terms of variations during the first COVID-19 wave. Further differences among clusters can be appreciated by studying the dynamics of the series in the period August 2020 -February 2021. In general, all the series show that the number of deaths has surged after growth slowed in summer. Almost all the series, in fact, have been trending upward during the autumn. Exceptions to these trends are represented by the pattern in cluster 5 where the regions do not seem to show any important tendency of rebound after the first phase. Understanding the potential reasons behind this trend may be crucial to identify key factors in the definition of effective preventive policies. Changes in the healthcare system, in the exposure of the vulnerable population as well as restrictive measurements, such as local lockdowns, might all have played an important role in flattening the mortality curves. For example, many regions have seen a marked improvement in the healthcare system, both in terms of intensive care units (ICU) capacity as well as access to novel effective treatments, that have guaranteed a timely and effective assistance to the most severe cases with a consequent reduction of the mortality rate. A further explanation for the reduced mortality in cluster 5, during the second wave, could be found in the harvesting effect. According to this theory, the most vulnerable population, made of elderly and those with health conditions, might have died during the first COVID-19 wave, especially in regions with a high infection rate. By contrast, the regions that were spared from the first phase have then shown an increase of the mortality in the second phase. Clusters 1, 2, 3, 6, 9 and 10 show similar bimodal mean shapes in the last five months, with some differences appearing in the rate of change and in the persistence of the phenomenon. In particular, the number of deaths in clusters 1 and 2 rises steeply than in other clusters from late September, while in clusters 9 and 10 the resurgence appears slower and delayed to reach a plateau until the beginning of February. These temporal patterns also differ from the dynamics of clusters 4 and 11 for which only one peak is observed near the end of the third week of January, denoting a significant reappearance of the phenomenon in UK. For the regions in cluster 4, this resurgence could be linked to the English Coronavirus variant, which has rapidly outcompeted pre-existing variants in the Southeast region of the country (Davies et al., 2021) . Finally, the dynamics of cluster 12, which includes most of the regions with the lowest population density, highlight a significant rate of change (see, Figure 7 ) of mortality between October and November 2020, with regions reaching their peak around the second week of November. In contrast with many other clusters, for this group of regions, the mortality has steadily decreased from the third week of November to attain low-stable levels in 2021. In this paper, we have analysed the regional patterns of spatio-temporal data on confirmed deaths by COVID-19 in NUTS-2 regions from six European countries between March 2020 and February 2021. To uncover homogeneous patterns of COVID-19 progression, we have implemented a Bayesian nonparametric approach that models (standardized) time series patterns as a mixture of Gaussian processes, specifying a Dirichlet process prior over mixture components. Although the statistical analysis was carried out under Gaussianity assumptions, our model could be easily cast within the more general framework of generalised linear models (GLM), where the data are assumed to be from an exponential family with distribution F . For example, considering the death counts, one obvious choice would be to assume a Poisson regression model with a log link function. In order to keep computations for the posterior distributions simple, one may then apply the data augmentation technique proposed by Tanner and Wong (1987) . The proposed methodology has allowed grouping the 145 regions into 12 clusters providing evidence that the effects of the pandemic varied a lot both temporally and across countries and regions. Some of the areas that have been hit the hardest in the first phase (for example, the Northern region of Lombardy in Italy) have seen a steep decline in the number of deaths soon after implementing control strategies, with contagion and deaths far less widespread in autumn and winter 2020. In contrast, other areas, which have not been affected in the first wave, have seen a sharp rise in deaths during the second and third waves. An example is given by the southern regions of Italy, which became a new hotspot. One advantage of our analysis concerns the possibility of defining community mitigation strategies not only at countries levels, but also at the level of clusters, where their members (administrative areas) appear homogeneous in terms of temporal patterns. This clustering effect calls for a strong inter-governmental coordination and for a territorial approach in order to define joint solutions and enhance acceptance of measures at all levels. The aim is thus to facilitate a regional cooperation to support recovery strategies by ensuring coherent mitigation guidelines, pooling resources, and strengthening investment opportunities. In particular, promoting sub-national governments could help to evaluate the degree of asymmetry and differentiate aid schemes to align with the differentiated impact of COVID-19. In policy-making, trying to understand the underlying causes of this heterogeneity represents a crucial step to evaluate the efficacy of the different mitigation strategies adopted by central and local governments. However, assessing the effect of the individual intervention policies is a very challenging task due to the inherent multidimensional nature of the phenomenon under investigation. From the statistical analysis, in fact, there is a clear hint that differences in population density, age structure, urbanisation and socio-economic factors contribute to differentiate the impact of COVID-19. Indeed, even if coronavirus infects exposed subjects indiscriminately, the severity of disease and social and economic effects are not being felt equally throughout the European territory, partly due to existing differences in health between and within countries, mostly related to socio-economic inequities. To fully understand the impact of containment policies on the pandemic dynamic, it would be necessary to explicitly relate these factors to the death counts within the proposed methodological framework. Albeit the proposed model could easily accommodate the introduction of these factors, in practice, we have found it infeasible as most of the covariates arising from multiple sources are often spatially misaligned or aggregated over different geographical units and, more importantly, they are only spatially referenced, making it impossible to evaluate their temporal effect on disease spreading. Accordingly, here, we have not attempted to establish the causality of the observed differences between clusters, but we have simply provided a tool to evaluate the evolution of the pandemic across European regions. The inclusion of these covariates and the other risks factors will be the objects of future studies. A spatio-temporal model based on discrete latent variables for the analysis of COVID-19 incidence Statistical analysis of non-lattice data Efficiency of pseudolikelihood estimation for simple Gaussian fields Curve clustering with spatial constraints for analysis of spatiotemporal data On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems On Gibbs sampling for state space models Determining the spatial effects of COVID-19 using the spatial panel data model Statistics for spatial data Model based clustering for expression data via a Dirichlet process mixture model Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England A Simple and Efficient Simulation Smoother for State Space Time Series Analysis Community mobility in the European regions during COVID-19 pandemic: a partitioning around medoids with noise cluster based on space-time autoregressive models Bayesian Density Estimation and Inference Using Mixtures An ensemble approach to short-term forecast of COVID-19 intensive care occupancy in Italian regions Spatial-temporal generalized additive model for modeling COVID-19 mortality risk in Toronto. Article in press in Spatial Statistics A Bayesian Analysis of Some Nonparametric Problems A note on intrinsic conditional autoregressive models for disconnected graphs A predictive approach to model selection Inference and Monitoring Convergence Determining the spatial effects of COVID-19 using the spatial panel data model The effect of large-scale anti-contagion policies on the COVID-19 pandemic A spatial model for multivariate lattice data Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information Markov Chain Monte Carlo in Approximate Dirichlet and Beta Two-Parameter Process Hierarchical Models Clustering random curves under spatial interdependence with application to service accessibility Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models A Space-Time Permutation Scan Statistic for Disease Outbreak Detection Multiple change point estimation of trends in COVID-19 infections and deaths in India as compared with WHO regions. Article in press in Spatial Statistics Quantifying the small-area spatio-temporal dynamics of the COVID-19 pandemic in Scotland during a period with limited testing capacity. Article in press in Spatial Statistics Time Series Clustering and Classification Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling Dirichlet process mixed generalized linear models A Bayesian Nonparametric Approach for Time Series Clustering Bayesian spatio-temporal joint disease mapping of COVID-19 cases and deaths in local authorities of England A Bayesian Kriged-Kalman model for short-term forecasting of air pollution level A constructive definition of Dirichlet priors Change point detection for COVID-19 excess deaths in Belgium Spatiotemporal analysis and hotspots detection of COVID-19 using geographic information system Dealing with label switching in mixture models The Calculation of Posterior Distributions by Data Augmentation Modelling With Mixtures (with discussion Assessing Mechanisms of Neural Synaptic Activity Bayesian Forecasting and Dynamic Models Comparison of spatiotemporal characteristics of the COVID-19 and SARS outbreaks in mainland China Andrea Bucci acknowledges the grant received by the project PON "Ricerca e Innovazione" 2014-2020 -Azione I.2 "Mobilità dei Ricercatori" -Avviso di cui al D.D. n. 407 del 27 febbraio 2018 e D.D. n.1621 del 12/08/2019 "Attraction and International Mobility" -LINEA 1 (Mobilità dei Ricercatori) (CUP n. D24I19001680001 -Id. proposta: AIM1894803 Linea 1: attività 3 Area di specializzazione SNSI: Smart, Secure and Inclusive Communities).