key: cord-0764978-pixckmlr authors: Cioban, Stefana; Mare, Codruta title: Spatial clustering behaviour of Covid-19 conditioned by the development level: Case study for the administrative units in Romania date: 2021-12-02 journal: Spat Stat DOI: 10.1016/j.spasta.2021.100558 sha: 404c3a8561a39eb0c19bbef617dacf4df16a53d0 doc_id: 764978 cord_uid: pixckmlr Spatial analyses related to Covid-19 have been so far conducted at county, regional or national level, without a thorough assessment at the continuous local level of administrative-territorial units like cities, towns, or communes. To address this gap, we employ daily data on the infection rate provided for Romanian administrative units from March to May 2021. Using the global and local Moran I spatial autocorrelation coefficients, we identify significant clustering processes in the Covid-19 infection rate. Additional analysis based on spatially smoothed rate maps and spatial regressions prove that this clustering pattern is influenced by the development level of localities, proxied by unemployment rate and Local Human Development Index. Results show the features of the 3rd wave in Romania, characterized by a quadratic trend. Our present world was, until last year, characterized by significant mobility, both national and international. This, obviously, led to a rapid spread of the Covid-19 virus (Andersen et al., 2021) . Consequently, the importance of space and spatial interactions has increased even more, along with studies accounting for spatial effects. Franch-Pardo et al. (2020) made a review of 63 such studies, showing the relevance of spatiality and its interdisciplinary applications in this field of research. Using data on the prefectures of China for January-February 2020, Liu et al. (2021) or Xie et al. (2020) show that environmental factors turned out to be highly impacting the spread of the virus, along with people's mobility or economic development. Andersen et al. (2021) or Mollalo et al. (2020) also try to depict the determinants of Covid-19 transmission at the local level on data for USA counties, emphasizing the need for geographic determination of the hotspots. Other recent studies attempt to build efficient methods that incorporate both time and space effects in order to model Covid-19's dynamics. Some examples are Lee et al. (2021) or Bartolucci and Farcomeni (2021) . The former assesses data at the postal districts level in Scotland, while the latter analyses Italian regions. Most of the studies are conducted at the national, regional, and county level, as data is usually provided in an aggregated format for these spatial layers. Through aggregation, important spatial information is lost (see, for example Schellekens and Sourrouille, 2020) , as the features of the dominant city in the county or region will prevail, while the features of the other localities will be mostly ignored. In a certain county, for example, there may be locality-level hot spots related to the Covid-19 infection rate, deaths related to the virus, etc., which, through spatial aggregation, are not properly depicted. At the same time, different city or village areas may have specific features that prevented them from registering serious Covid-19 problems. Moreover, the importance of spatial heterogeneity in Covid-19 issues assessment is highlighted in several studies (Thomas et al., 2020; Hou et al., 2021, etc.) . Consequently, these features are important as they may emphasize what can be done to prevent or reduce the severity of hot spots. Additionally, in some areas, a good performance in respect of the pandemic may not emphasize specific features, but, unfortunately, may indicate deficiencies in testing procedures or, even worse, in the reporting of results. Even though there are studies evaluating the issue in various cities (De Kadt et al., 2020) and metropolitan areas (Su et al., 2020) , to our knowledge, no study has yet assessed the problem of Covid-19 at the level of all administrative-territorial units (ATUs), in the continuous space of a country, at the locality level. And the main reason for this is the lack of data, most of the information being reported at more aggregated levels. We have, thus, identified the need to study the problems related to the spread of Covid-19 from a spatial perspective at the level of all cities and communes of a country. The reason behind choosing a finer granularity for analysis is easily understood from the eyes of the data-driven geospatial analyst. We use the Romanian ATUs for which data started to be registered and officially published on March 1, 2021. To our knowledge, this is the first study assessing the Covid-19 infection rate and its determinants at the level of all cities and communes for an emerging country using official data. Ahmadi et al. (2020) clearly emphasize the need for more spatial analysis using GIS and related at local scales, to properly depict the features of the pandemic. The purpose of our study is to evaluate the spatial distribution of the Covid-19 infection rate and assess the intensity of the spatial effects over time. We point out the clustering tendency of the infection rate at the ATUs level and how it correlates to the local development level and the unemployment rate, along with significant local contagion and diffusion. We also analyse the time trend of the daily autocorrelation index in relationship with other Covid-19 related indices for which no spatial observations were recorded. The clustering behaviour intensifies as the number of infections increases and it seems to cause a similar but delayed temporal pattern for the average number of Covid-19 related deaths and average number of recoveries at the country level. In addition, following the work of Schellekens and Sourrouille (2020) or Xie et al. (2020) , we show that, contrary to general expectations, a higher infection rate is characteristic to and clustered around more developed urban areas that report lower unemployment rates and that are proven to act as magnet cities (Cristea et al., 2017) and implicitly, have higher social interactions and better standards of living (Sandu, 2021) . The level of development may, therefore, act as a transmission and conditioning channel. Thus, among the goals of our research is to contribute to the clarification of the debate present in the field's literature by showing that there is a clear and positive relationship between development level and Covid-19 infections. The spatial units for which data was provided are given by the 3181 Romanian ATUs with 101 cities, 216 towns and 2864 communes (Sandu, 2020) , representing the NUTS-5 level in Romania. For the local development level, two proxies were chosen -unemployment rate and Local Human Development Index (LHDI), the latter being an index describing the social development and quality of life that was estimated and updated by Sandu (2020) for the World Bank. Starting from the Okun's law (Okun, 1962) , that describes the negative relationship between unemployment and the real GDP growth rate, and following the studies of Karikari-Apau and Abeti (2019), Meidani and Zabihi (2011) that prove short-and long-run negative relations between unemployment and economic development, or Mishel and Shierholz (2011) who point out the significant loss in incomes and welfares of countries due to unemployment, we have chosen the unemployment rate as a proxy for the economic development at ATUs level. We base this choice on the fact that unemployment implies lower income, leading to a lower general standard of living. Wealthier and more developed spatial units have lower unemployment levels, as also pointed by Cristea et al. (2017) for Romania. Table 1 presents the variables employed. For the exploratory spatial analysis phase, quartile maps were assessed for each striking turn in the infection curve of the third wave in Romania: the period in which the most Covid-19 infection cases were recorded -end of March 2021 -and the end of the study period -beginning of May 2021. As the goal of our study was to evaluate spatial clustering behaviour of Covid-19 infection rate, we have started our analysis following the work of Griffith (1987 Griffith ( , 2003 and used the global and local Moran's I Spatial Autocorrelation coefficient (LISA - (Anselin, 1995) ) to determine the degree of self-relative correspondence between spatial units (represented by ATUs): Where: x i,t = the value for ATU i, x j,t = the value for ATU j, µ t = the average value of the variable for all ATUs, w ij = the coefficient of the spatial weights matrix for i and j, n = number of ATUs. While the global Moran's I index is a measure of overall spatial autocorrelation of the Covid-19 infection rate, the clusters are locally identified and provided by the LISA analysis. The significance of the Moran's I index was assessed based on the z-score and p-value statistics, through the randomization procedure. For this and the following spatial modelling phases, we opted for a topology-based spatial weights matrix: the contiguity one, in the queen form, with rows-standardization applied (Griffith, 2020) . The row-standardization allows for the interpretation of the spatial lag of the dependent variable represented by the infection rate as the weighted average value of neighbouring observations. Moreover, the global Moran's I index for the daily Covid-19 incidence was analysed and recorded in order to model the evolutionary pattern of the clusterization process. Time was considered as main factor and used to estimate the trend of the global Moran's I index. A 2nd order equation turned out to be the most suitable, following the structure of Eq. (2), in which t is the daily time variable: Additionally, this approach allows the time series dependence analysis against other covidrelated indicators which were not available at the ATU level, but only at national level: new tests, new deaths, recovered people, new cases retrieved from the dataset on the evolution of Covid-19 in Romania (Vana, 2021) . This procedure aimed to understand the clustering behaviour of the Covid-19 incidence rate in relation to other indicators. Results are presented in Appendix C. Having both cities and communes in our database implies high heterogeneity. To treat this and assess the relationship between the incidence rate and development, we construct two spatially smoothed maps of the ratio between Covid-19 incidence on the day the maximum global Moran's I was recorded, March 26, and LHDI and unemployment rate, respectively. Using the same contiguity matrix, the smoothed rate for a given ATU i is calculated as follows: where O j is the Covid-19 incidence rate in location j and P i is the development proxy. This process smoothens the values of the incidence rates in space by using window averages based on the neighbouring scheme (Anselin et al., 2006) . In this way the broad spatial trends are highlighted, and data peaks are treated, as the number of outliers identified using the LISA analysis is reduced. Spatial processes like clusterization, polarization, contagion, or diffusion are better assessed through this approach. To confirm the results of the spatially smoothed maps and assess the significance of the development level impact upon Covid-19, we perform the spatial regression analysis using Covid-19 infection rate for March 26 as dependent and introduce the LHDI and unemployment rate separately. Following the standard procedures, we start from the OLS (Eq. (4)) with the spatial diagnosis tests that reveal both autoregressive and moving average spatial processes (SARMA) that lead to inconsistent OLS estimators: Where Y is an nx1 vector of the dependent variable, infection rate, α is the constant of the model, X is the observation matrix of the explanatory variable for the n ATUs, β is the 1x1 vector of the regression coefficients for the development measure, ε is the nx1 vector of the error term, and n is the total number of 3181 ATUs. Our data is characterized by a low spatial data aggregation level which leads to high heterogeneity that needs treatment, as emphasized by the highly significant heteroskedasticity tests Therefore, we estimate and present the results of the spatially weighted two-stage least squares (HET) estimations (GMM method for spatial lag and error model with heteroskedasticity - (Anselin, 2011) , Eq. (5)) 1 : 1 Results of intermediary estimations are available upon request. Here, ρ is the spatial autoregressive coefficient, W is the nxn spatial lag operator, while WY is the spatially lagged infection rate, X is the nx1 matrix of observations on the exogenous explanatory variable, the development proxy, with the β coefficient vector, u represents the spatial autocorrelated errors which are respecified based on their spatial lag, λ is the coefficient of the spatial autoregressive component of the residuals (the error correction term), Wu is the spatial lag of the errors and ε is the final error term of the model, obtained after the error correction process. The error term is normally distributed, of E(ε) = 0 and E(εε') = σ 2 I. The idea of this methodological approach is to incorporate the spatial autoregressive and heteroskedastic structures for the error variance in our model and to obtain more efficient regression coefficients. Table 2 presents the synthesis of these results. Analyses were conducted in Python and published together with the pre-processed dataset in a GitHub repository (Cioban et al., 2021) . For the Covid-19 infection rate, the monthly mean of all ATUs during the analysed period increased from 1.42 in March to 1.67 in April, with a decrease in May to 0.88. While a rate of 0 infections was recorded during multiple days in multiple cities or villages from almost all counties in Romania −41 specifically-, the maximum of 37.7 was recorded for two days, March 17th and 18th, in Bălcesti, a town located in Vâlcea County. Lower rates can be explained not only by a low infection spread within the particular spatial units, but also by a low testing rate or by the lack of reporting of the daily infection numbers. The quartile maps (Fig. 1) constructed for every turning point in the infection trend line give an insight into clustering tendencies which seem to correspond to the overall tendency of the total reported Covid-19 cases per day. While at the beginning of March and May, the quartile distributions of infection rates were more random, the representation of the same map for the end of March 2021 and beginning of April seems to enhance a stronger clustering tendency. Therefore, the daily global spatial autocorrelation was analysed for statistical significance. The positive Moran's I for the Covid-19 infection rate indicated a global tendency towards clustering (similar behaviour grouped in space, validated through the simulated p-value) (Fig. 2) . Positive spatial autocorrelation also indicates possible contagion and diffusion processes (Mare (Fig. 3) for the rate of Covid-19 infections, an increase in the overall tendency of spatial dependence was observed in the second half of March, followed by a decreasing trend by the end of the study period. The maximum value for the global Moran's I index was recorded for March 26, 2021. What is more, a statistically significant polynomial trend of the 2nd order was estimated on top of the daily series with an R-squared of 0.8 (Fig. 3) , emphasizing the evolution of the 3rd wave in Romania. The Moran's I values were also modelled together with other covid-related indicators (available only at country level) and results show significant impact when the clusterization series are compared to the average new daily cases, the average number of daily deaths and to the average number of recovered people (see Appendix C). Among all indices, the clustering behaviour in time is best explained by the number of daily infections: the more daily Covid-19 attacks at the country level, the stronger the tendency towards clusters of similar values. The local spatial autocorrelation analysis was performed for the day when the maximum global Moran's I index was computed, on March 26 (Fig. 4) . When plotting the clusters approximated by LISA, an interesting pattern can be observed for the high-high clusters as they seem to coincide with developed regions across the country. The most prominent clusters of high infection rates surrounded by other high values are Timisoara, Arad, Deva, Targu Jiu, Cluj-Napoca and Bucharest, followed by more isolated clusters around Brasov, Pitesti, Oradea, Constanta, and Iasi. These regions are generally represented by a higher LHDI, with stronger economy, industry, and education. Additionally, according to the World Bank (Cristea et al., 2017) , most of them act as magnet cities in Romania. The low-low clusters are mostly spread across the North of the country, the East close to the border with the Republic of Moldova, with prominent such clusters alongside the Eastern Carpathian Mountains. These clusters correspond to either remote, less populated areas or simply to less developed regions, which are at risk of poverty or social exclusion. In terms of the association of dissimilar values or outliers, the high-low and low-high clusters do not stand out and are generally isolated and small when compared to the other clusters. As stated in the methodology, we use the unemployment rate and the LHDI as proxies for the level of development. The bivariate analysis (Fig. 5 ) enhances the findings of the spatial agglomerations of high-value clusters around the developed urban areas and are also in line with the findings at the global level (Schellekens and Sourrouille, 2020) . While globally, the mortality, and implicitly, the rate of infection, are more prominent in the high-income countries, at a national level, in the case of a developing country, Romania, the high values of the rate of infection cluster around the more developed urban areas. This is because, usually, the cluster centres are cities that act as ''attraction points'' for the working force. They are generally essential university centres or have other specificities that determine people to move there and have a socially interactive life and higher mobility tendency. Consequently, their local economies are characterized by high development levels and low unemployment (Appendices A and B). But being attractive comes with a cost: a higher infection probability. The regression results confirm the significant impact of development upon Covid-19 spatial spread and clusterization ( Table 2 ). The OLS results clearly show the presence of complex spatial processes, both autoregressive and moving average (significant LM SARMA). Regardless of the estimation method used, with or without heterogeneity treatments, the coefficients of LHDI are positive, while those of the unemployment rate are negative, and both highly significant. Consequently, our assumptions, along with the information given by the spatially smoothed map results are confirmed. Covid-19 infection rate clusters around wealthier and highly urbanized localities with higher living standards and lower unemployment rates, as they have specific economic features that make them magnets for citizens. The spatial lag of Covid is positive and significant, confirming the contagion and diffusion processes. Lambda is significant only when the unemployment rate is considered as proxy, for the LHDI it becomes unsignificant. This can be explained by the fact that the error correction term is modelling the spatial impact of other factor variables upon the Covid-19 rate, which are not included in the model. Consequently, the errors of the original model (be it OLS or other) are spatially correlated and treatment is required in the form of spatial error correction. When the unemployment rate is considered as an explanatory variable, there are definitely other aspects that are left aside and have a significant positive spatial impact upon the spread and clusterization of Covid-19 cases at the ATUs level. That is why the lambda coefficient is positive and significant. In contrast, the LHDI is a composite index made up of several variables that incorporate, among others, issues related to the standard of living, education, etc. Consequently, it brings into the model additional information in the form of the explanatory variable and incorporates the spatial effects left aside by the simple consideration of unemployment. The overall spatial effect being positive, the above-mentioned results are confirmed. Assessing the Covid-19 infection rate for the ATUs in Romania has emphasized significant clustering processes, along with contagion and diffusion, conditioned by the level of development as proxied by the most recently available unemployment rate and the local human development level. This may be explained by the fact that more developed cities are magnets for the workforce, having higher living standards, as they provide better quality jobs, life expectancy and more opportunities. Additionally, people living in such ATUs are characterized by a higher education level and consequently by a higher level of awareness towards the spread of the virus, leading to an increase in the testing rate and, of course, in the reported infection cases. Nonetheless, a viable reason is the high mobility engaged in well-developed urban areas, both internal (migration and commuting) and external (professional and touristic). The peak in the 3rd Covid-19 wave in Romania (the turning point of the quadratic evolution) and the decreasing trend after the end of March 2021, along with decreasing intensity in the local spatial autocorrelation, may be explained by a lag effect. By this time, the vaccination process in Romania had become accessible to all people without a preliminary appointment. Hence all citizens were able to undergo it, and the effects are starting to be felt. As expected, the tendency towards clustering follows the temporal pattern of the number of infections themselves -the more infections, the higher the positive autocorrelation -and it seems to cause a similar, but delayed effect for the country level Covid-19 caused deaths and recoveries. Data at this administrative level has only begun being registered for Romania. We will continue this assessment by also considering more complex space-time models once the database increases. Additionally, we intend to introduce other variables in the modelling process to depict local specificities and Covid-19 infection determinants. A significant limitation of the analysis is given by data availability, that prevents us to conduct a more detailed research, compare the 3rd wave features with the previous two or explore the dependence between the rate of infection and development level/ unemployment from a spatiotemporal perspective. Another issue is the availability of data related to the development level of the cities and communes in Romania. Just like in the case of other countries, such data is only assessed at census times, meaning once in ten years. There are very few variables measured more frequently by the National Statistics Institute or other entities. Modeling and forecasting trend of covid-19 epidemic in iran until may 13, 2020 Analyzing the spatial determinants of local Covid-19 transmission in the United States Local indicators of spatial association-LISA GMM Estimation of Spatial Error Autocorrelation with and Without Heteroskedasticity Rate Transformations and Smoothing Covid-19 -Romanian economic impact monitor. UBB FSEGA A spatio-temporal model based on discrete latent variables for the analysis of COVID-19 incidence Covid_19_spatial_analysis, retrieved from the GitHub repository Magnet Cities: Migration and Commuting in Romania Mapping vulnerability to COVID-19 in gauteng Spatial analysis and GIS in the study of COVID-19 Spatial autocorrelation Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization Some guidelines for specifying the geographic weights matrix contained in spatial statistical models 1 Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race The Impact of Unemployment on Economic Growth in China Quantifying the small-area spatio-temporal dynamics of the Covid-19 pandemic in Scotland during a period with limited testing capacity The spatial clustering analysis of COVID-19 and its associated factors in mainland China at the prefecture level Insurance literacy and spatial diffusion in the life insurance market: A subnational approach in Romania The dynamic effect of unemployment rate on per capita real GDP in Iran Sustained, high joblessness causes lasting damage to wages, benefits, income, and wealth GIS-based spatial modeling of COVID-19 incidence rate in the continental United States Potential GNP: Its Measurement and Significance Actualizarea Indicelui DezvoltăRii Umane Locale Fundamentare;I Aplicabilitate (Updating the Local Human Development Index. Substantiation and Applicability) De unde vin infectările cu noul coronavirus la nivel local: între compozi;ia popula;iei;i sistemul urbanregional (roots of COVID-19 infections at the local level: between population composition and urban-regional systems) The Unreal Dichotomy in COVID-19 Mortality Between High-Income and Developing Countries. The Brookings Institution Evaluation of the secondary transmission pattern and epidemic prediction of COVID-19 in the four metropolitan areas of China Spatial heterogeneity can lead to substantial local variations in COVID-19 timing and severity Evolution covid19-Romania Spatial and temporal differentiation of COVID-19 epidemic spread in mainland China and its influencing factors We would like to thank the Editor in Chief, prof. Alfred Stein, and 3 anonymous reviewers for their support and valuable comments that helped us improve our manuscript. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.