key: cord-0722936-w4uwzogr authors: Peng, Di; Qian, Jian; Wei, Luyi; Luo, Caiying; Zhang, Tao; Zhou, Lijun; Liu, Yuanyuan; Ma, Yue; Yin, Fei title: COVID-19 distributes socially in China: A Bayesian spatial analysis date: 2022-04-20 journal: PLoS One DOI: 10.1371/journal.pone.0267001 sha: 27b305ea846b39929117d6bb9d31a5bf03d3908a doc_id: 722936 cord_uid: w4uwzogr PURPOSE: The ongoing coronavirus disease 2019 (COVID-19) epidemic increasingly threatens the public health security worldwide. We aimed to identify high-risk areas of COVID-19 and understand how socioeconomic factors are associated with the spatial distribution of COVID-19 in China, which may help other countries control the epidemic. METHODS: We analyzed the data of COVID-19 cases from 30 provinces in mainland China (outside of Hubei) from 16 January 2020 to 31 March 2020, considering the data of demographic, economic, health, and transportation factors. Global autocorrelation analysis and Bayesian spatial models were used to present the spatial pattern of COVID-19 and explore the relationship between COVID-19 risk and various factors. RESULTS: Global Moran’s I statistics of COVID-19 incidences was 0.31 (P<0.05). The areas with a high risk of COVID-19 were mainly located in the provinces around Hubei and the provinces with a high level of economic development. The relative risk of two socioeconomic factors, the per capita consumption expenditure of households and the proportion of the migrating population from Hubei, were 1.887 [95% confidence interval (CI): 1.469~2.399] and 1.099 (95% CI: 1.053~1.148), respectively. The two factors explained up to 78.2% out of 99.7% of structured spatial variations. CONCLUSION: Our results suggested that COVID-19 risk was positively associated with the level of economic development and population movements. Blocking population movement and reducing local exposures are effective in preventing the local transmission of COVID-19. Pneumonia caused by a novel coronavirus was identified in Wuhan, China, in December 2019 and then named coronavirus disease 2019 . Most patients experience mild to moderate respiratory symptoms, i.e., fever, fatigue, cough, and shortness of breath, but some patients with underlying health disorders, such as respiratory or cardiovascular diseases, can develop severe illness and even die [1] . A small proportion of COVID-19 cases are also reported without any symptoms but with positive viral nucleic acid test results [2] . COVID-19 transmits mainly via respiratory droplets and direct contact [3] . The incubation period of COVID-19 is within 14 days with a mean time of 5.2 days, and the mean basic reproductive number (R 0 ) ranges from 1.4 to 3.9 [3] [4] [5] [6] [7] . Due to the long incubation period and high infectivity, COVID-19 cases can accumulate unidentified, and the high proportion of severe cases that come later may quickly overload the health system, consume local health resources, and cause large numbers of mortalities [8] [9] [10] [11] . Therefore, the World Health Organization (WHO) declared COVID-19 a pandemic on 12 March 2020. As of 30 August 2020, the epidemic of COVID-19 had affected more than 200 countries and regions, with more than 20 million confirmed cases worldwide. At present, several COVID-19 vaccines have been authorized for use throughout the world [12] . However, the lack of vaccine and coverage remains a challenge due to insufficient production and allocation. To date, public health intervention is still one of the most efficient ways to control the development of the COVID-19 pandemic [12, 13] . Studying the spatial patterns and influencing factors for COVID-19 is important to identify high-risk areas for further intervention and risk evaluation for local prevention and control strategy development. Additionally, health resources can be allocated in advance to prepare for the peak of incoming patients with severe illness. The distribution of COVID-19 shows significant spatial heterogeneity [14] . The occurrence and transmission of COVID-19 in the population may be influenced by natural, environmental, and social factors [14] [15] [16] [17] [18] . However, to date, there is a lack of comprehensive studies on quantifying the spatial pattern of COVID-19 and detecting the spatial impact of potential influencing factors on the COVID-19. In our study, using spatial methods including spatial autocorrelation and Bayesian spatial models, we analyzed the data in 30 provinces in mainland China (outside of Hubei), and presented the spatial pattern of COVID-19. Our study detected the hot spots and cold spots of COVID-19, and explored the relationship between the risk of COVID-19 and socioeconomic factors. The purpose of this study was to determine high-risk areas of COVID-19 and explore potential influencing factors of COVID-19 to provide a basic understanding of COVID-19 for further public health interventions and resource allocations in other countries and areas. Data on the COVID-19 cases in 30 provinces in mainland China from 16 January 2020 to 31 March 2020, were collected from the National Health Commission of the People's Republic of China. Hubei Province, where COVID-19 was first detected, was not included, as this study aimed to explore the socioeconomic and environmental risk factors after the import of the COVID-19 cases. Four types of variables were collected for each province. Demographic variables included the total population, population density (PD), proportion of urban population at year-end (PUP), and percentage of illiterate population to total aged 15 and over (PIP). Economic variables included gross domestic product (GDP), foreign exchange earnings from international tourism (FEEFIT), and per capita consumption expenditure of households (PCCEOH). Health variables included the number of health care institutions (NOHCI), number of beds in health care institutions (NOB), medical technical personnel in health care institutions per 1000 persons (MTP), and beds of medical institutions per 1000 population (BOMI). Transportation variables included the passenger traffic (PT), passenger kilometers (PK), and proportion of the migrating population from Hubei from 16 January 2020 to 24 January 2020 (POMPFH). The POMPFH variable was provided by the Baidu Qianxi website [19] . The other variables were collected from the China Statistical Yearbook (2019). The cumulative incidences of COVID-19 in 30 provinces and descriptive statistics [range, mean, standard deviation (SD), median, and quartile] of variables were calculated. Depending on whether the variables were normally distributed, Pearson or Spearman correlation coefficients were used to estimate the crude correlation between the incidence of COVID-19 and provincial variables. Global Moran's I of the incidence of COVID-19 was calculated as follows to quantify spatial autocorrelation [20] : where n (n = 30) is the number of studied provinces; y i and y j are the incidence of COVID-19 in the i th province and j th province, respectively; � y is the average incidence of COVID-19; w is the weight matrix to measure the adjacency relation of provinces, w ij is 1 if provinces i and j are neighbors and 0 otherwise. Moran's I ranges from -1 to 1, indicating the spatial autocorrelation of the incidence of COVID-19. Bayesian spatial models were built to present the spatial pattern of COVID-19 and estimate the comprehensive relationship between the COVID-19 risk and variables. The Bayesian spatial model assumed that the reported counts of COVID-19 cases (Y i ) for the i th province followed a Poisson distribution with mean λ i [21, 22] : and where E(Y i ) is the expected count of COVID-19 cases for each province, calculated by multiplying the overall incidence of COVID-19 of the 30 provinces by the population for each province during the study period; ρ i is the relative risk (RR) of COVID-19 in the i th province, and the log RR was modeled as where b 0 is the intercept, representing the average risk of COVID-19 in all the 30 provinces. x ki is the k th socioeconomic or environmental variable in the i th province; b k and e b k are the regression coefficient and RR of the k th variable, respectively; u i is a spatially structured random effect, reflecting the spatial correlation between adjacent provinces; and v i is a spatially unstructured random effect, which is caused by other nonspatial factors and can be interpreted as unknown or unobserved covariates at the provincial level. u i +v i quantifies the total spatial effect and exp(u i +v i ) is the spatial RR of the disease. The integrated nested Laplace approximation (INLA) method was used to estimate the parameters of the Bayesian spatial model, and minimally informative priors were specified on the above random effects by default [23] . b 0 and b k were modeled using a vague prior followed the Gaussian distribution with mean zero and variance 10 6 . u i was modeled using a conditional autoregressive specification (CAR) with mean zero and variance s 2 u , and v i was modeled as exchangeable with mean zero and variance s 2 v . The precisions of u i and v i were specified as Gamma (1, 0.0005). The spatial risk of COVID-19 for each province was identified based on the results of the Bayesian spatial model without covariables. According to the posterior distribution of u i +v i , 30 provinces were divided into hot, cold, and other spots. A province with the posterior probability pðexpðu i þ v i Þ > 1jdataÞ > 0:9 was defined as a hot spot; if its posterior probability pðexpðu i þ v i Þ > 1jdataÞ < 0:1, a province was defined as a cold spot; the remaining provinces were neither hot nor cold spots. In addition, the proportion of variance explained by u i was calculated according to the posterior marginal distribution of s 2 u and s 2 v to quantify the relative importance of u i . The larger the proportion was, the more the variability was explained by u i . Then, according to the results of multivariate Bayesian spatial models, the relationship between COVID-19 and variables was explored, and the coefficients of variables were estimated. Data were analyzed in R software (version 3.6.1, using "base", "maptools", "raster", "spdep", "rgdal", "psych", "ggplot2" and "INLA" packages). From 16 January 2020 to 31 March 2020, a total of 13756 cases of COVID-19 were reported from the 30 provinces in mainland China (outside of Hubei). Fig 1 shows the spatial distribution of the cumulative incidence of COVID-19 at the provincial level. The incidence of COVID-19 ranged from 0.03/100000 (Tibet) to 2.70/100000 (Beijing). Provinces with high incidence were mainly located in eastern and southeastern China. The socioeconomic and environmental variables are presented in Table 1 . Spearman correlation coefficients between the incidence of COVID-19 and variables ranged from -0.28 to 0.65 (S1 Table) . Among the 13 variables, the incidence of COVID-19 was significantly associated with PD (r = 0.65, P<0.01), POMPFH (r = 0.57, P<0.01), PCCEOH (r = 0.54, P<0.01), PUP (r = 0.53, P<0.01), GDP (r = 0.46, P<0.05), and FEEFIT (r = 0.43, P<0.05). However, the correlations among the 6 socioeconomic variables were also significant. To reduce multicollinearity, one of the paired variables with a Spearman correlation coefficient greater than 0.7 was excluded. Thus, PUP and GDP were excluded in subsequent analyses. The estimated Moran's I statistic of the incidence of COVID-19 was 0.31 (P<0.05), indicating that a global spatial autocorrelation of COVID-19 was detected at the provincial level in mainland China during the studied period and that the incidence of COVID-19 was similar among adjacent provinces. A Bayesian spatial model without covariates was fitted. The estimated spatial RR (exp(u i +v i )) of COVID-19 by province is shown in Fig 2A, where 14 of 30 provinces had a 95% CI of the Fig 2B maps the distribution of hot-cold spots of COVID-19 in the 30 provinces. Among the 30 provinces, 15 and 14 provinces were classified as hot spots and cold spots, respectively. Shandong was the only province that was neither a hot spot nor a cold spot. The posterior means of s 2 u and s 2 v were 1.277 and 0.002, respectively. The proportion of structured spatial variance in the total variance was 99.75%, indicating that a large part of COVID-19 variability could be explained by the spatial structure. Potential factors associated with the COVID-19 structured spatial variability should be included in the Bayesian spatial model for analysis. Four variables, i.e., PD, FEEFIT, PCCEOH, and POMPFH, were incorporated into the Bayesian spatial model. The results of the model showed that PCCEOH and POMPFH were significantly and positively correlated with COVID-19. The estimated coefficients for PD and FEEFIT were not significant. A multivariate Bayesian spatial model with only the significant PCCEOH and POMPFH was fitted. The final coefficients and RR of PCCEOH and POMPFH are listed in Table 2 . The results showed that a 10,000 yuan rise in PCCEOH was related to an increase of 88.7% in the COVID-19 risk, and a 1% rise in POMPFH was related to an increase of 9.9% in the COVID-19 risk. The reason for the statistical significance of PCCEOH and POMPFH may be that people from Hubei mainly went to the provinces adjacent to Hubei, and the areas with high economic development experienced more population movements during the Spring Festival travel season. The estimated spatial RR of COVID-19 is shown in Fig 3. Overall, the spatial heterogeneity of RR was reduced, and most RRs were close to 1. Furthermore, the posterior mean of s 2 u and s 2 v became 0.002 and 0.227, respectively, and the proportion of structured spatial variance in the total variation was reduced to 21.51%. Thus, the covariables in the model explained 78.24% of the total 99.75% of the structured spatial variance and reduced considerable spatial heterogeneity of RR. However, high RRs were still located in Jiangxi, Ningxia, Hainan, and Heilongjiang. Further research is needed to investigate other variables to explain such excessive risk in these provinces. From the time it was declared a Public Health Emergency of International Concern (PHEIC) on 30 January 2020, to the time it was declared a global pandemic on 11 March 2020, COVID-19 increasingly threatened the public health security worldwide. This study investigated the spatial distribution of COVID-19 and its relationship with potential influencing factors in China. The results revealed that the areas with a high risk of COVID-19 were in the vicinity of Hubei Province and in the provinces with high economic development, as well as Heilongjiang, Hainan, and Ningxia Province. The proportion of spatial autocorrelation variation in the total variation was reduced from 99.75% to 21.51% by including socioeconomic covariates. Most of the variation in COVID-19 could be explained by socioeconomic factors, which indicated that socioeconomic factors might play important role at the early stage of the pandemic. The per capita consumption expenditure and the proportion of the migrating population from Hubei were positively correlated with the disease risk of COVID-19. These findings can be helpful for the control and prevention of the potential epidemics and emergencies in the future. Several policies such as home quarantine, lockdown, and social distancing were taken to prevent the spread of COVID-19 during the pandemic. These policies varied by regions and their impacts on the COVID-19 pandemic were different. For instance, the traffic blockade in Wuhan city delayed the development of COVID-19 in China [24] . While in others provinces except Hubei, the managements such as home quarantine of migrating population of other regions had played a great role in controlling the local epidemics. However, the effects of these measures were not considered in our study due to the difficulty of quantifying these measures. Further research is needed to explore such impacts of the differences in local policies. Besides, the uncertainty of our study should be mentioned. The COVID-19 cases in our study were collected from the National Health Commission of the People's Republic of China, which diagnosed and reported timely according to the prevention and control plan of Novel coronavirus pneumonia [25] . The socioeconomic factors were obtained from the China Statistical Yearbook. However, the impacts of the local policies were not considered in this study due to the lack of method to quantify different local policies. Further research is needed to explore the impacts of the local policies. The high-risk areas of COVID-19 were mainly in eastern and southern China, which was probably associated with population mobility [26, 27] . The COVID-19 outbreak originated in Wuhan, Hubei, China, and coincided with the Spring Festival transport season in 2020. Due to the high infectivity and long incubation period of COVID-19, as well as the lack of prevention and control management before the emergency response was launched, infected individuals and COVID-19 patients spread to other provinces, causing in the COVID-19 epidemic to spread across the provinces that received numerous people from Hubei. From the beginning of the Spring Festival transport season to the 2nd day of the traffic blockade in Wuhan, over half of the outflowing population from Hubei went to adjacent provinces, thus increasing the local risks. The adjacent provinces included Henan, Hunan, Jiangxi, Chongqing, and Anhui. However, high COVID-19 risk was still associated with high per capita consumption expenditure even controlling with the proportion of the migrating population from Hubei. These provinces with high socioeconomic development (Guangdong, Beijing, Shanghai, Zhejiang, and Jiangsu) had higher inflow populations during the Spring Festival transport season than other provinces. The proportion of the population from Hubei in these provinces only reflected the imported population from Hubei. The cases imported through transit might be indirectly reflected in the per capita consumption expenditure. A study by Wu et al. found independent and self-sustaining community transmission in several major cities in China between 31 December 2019 and 28 January 2020, which might be closely related to the local population density [28] . Notably, the correlation between COVID-19 risk and population density or foreign exchange earnings from international tourism was not statistically significant after controlling for the influence of the proportion of population from Hubei. The reason might be that the implementation of the national emergency response at the end of January 2020 reduced the local COVID-19 exposure by implementing household restrictions and the wearing of masks. Then, the community transmission of COVID-19 was controlled [29, 30] , and as were the effects of local population density. This result suggested that the emergency response measures adopted in China sufficiently controlled the community transmission. Besides, the foreign exchange earnings from international tourism partly reflecting the international imported population, were found to be nonsignificant, as there were only a few infected individuals and COVID-19 patients at the early stage. However, with the increasing global threats, more attention should be paid to the risk of imported cases. High RRs of COVID-19 were found in Jiangxi, Ningxia, Hainan, and Heilongjiang Provinces. The high risks in these four provinces were not explained by the factors considered in this study. One possible reason is the general public's compliance with the local government's prevention and control measures during the COVID-19 pandemic. Some COVID-19 patients did not adherence to the prevention measures and even concealed their visits to Wuhan or close contact with people from Wuhan. Such behaviors led to clustered cases and increase the local RR. Several COVID-19 clusters were reported in these four provinces (Jiangxi, Ningxia, Hainan, and Heilongjiang), most of which were family clusters. As of February 6, Heilongjiang had reported 48 clustered outbreaks with 194 cases [31] . Hainan Province had reported 12 clustered outbreaks until February 3 [32] . And there were 43 family clusters cases reported in Ningxia from January 27 to March 16 [33] . As for Jiangxi, a total of 195 clustered outbreaks were reported until February 28 [34] . This study has several limitations. Firstly, the province was the geographic unit used in the spatial analysis. For more useful information and more accurate intervention, a smaller geographic unit scale, such as prefecture level, is needed to analyze in further studies. Secondly, we only explored the relationship between the risk of COVID-19 and socioeconomic factors in this study. Natural and environmental risk factors such as climate and air pollution should be considered in further studies. Thirdly, the impacts of the local policies were not considered in this study due to the lack of method to quantify different local policies. Further research is needed to explore the impacts of the local policies. In summary, this study revealed the spatial distribution and investigated influencing factors of COVID-19 in 30 provinces in mainland China (outside of Hubei). The high-risk areas were in the vicinity of Hubei Province and in the provinces with high economic development. The risk of COVID-19 was associated with two socioeconomic factors, the per capita consumption expenditure of households and the proportion of the migrating population from Hubei. Both the model analysis and the actual actions in China showed that blocking population movement and reducing local exposures (by staying at home and wearing masks) were effective in preventing the local transmission of COVID-19. Supporting information S1 Table. Spearman correlation coefficients between COVID-19 incidence and each variable of the 30 provinces in mainland China (outsides of Hubei). (PDF) S1 File. (DOCX) Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Zhonghua liu xing bing xue za zhi The epidemic of 2019-novel-coronavirus (2019-nCoV) pneumonia and insights for emerging infectious diseases in the future Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Pattern of early human-to-human transmission of Wuhan The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China Potential scenarios for the progression of a COVID-19 epidemic in the European Union and the European Economic Area Active quarantine measures are the primary means to reduce the fatality rate of COVID-19 Estimates of the severity of coronavirus disease 2019: a model-based analysis COVID-19 healthcare demand and mortality in Sweden in response to non-pharmaceutical mitigation and suppression scenarios Real-world effectiveness of COVID-19 vaccines: a literature review and meta-analysis Isolation, quarantine, social distancing and community containment: pivotal role for old-style public health measures in the novel coronavirus (2019-nCoV) outbreak GIS-based spatial modeling of COVID-19 incidence rate in the continental United States COVID-19 transmission in Mainland China is associated with temperature and humidity: A time-series analysis A spatial analysis of the COVID-19 period prevalence in U.S. counties through Impact of meteorological factors on COVID-19 pandemic: Evidence from top 20 countries with confirmed cases Significance of geographical factors to the COVID-19 outbreak in India Spatial analysis in epidemiology Bayesian disease mapping: hierarchical modeling in spatial epidemiology Bayesian modelling of inseparable space-time variation in disease risk. Statistics in medicine Spatial and spatio-temporal Bayesian models with R-INLA The positive impact of lockdown in Wuhan on containing the COVID-19 outbreak in China The prevention and control plan of Novel coronavirus pneumonia Population flow drives spatio-temporal distribution of COVID-19 in China Spatial and temporal differentiation of COVID-19 epidemic spread in mainland China and its influencing factors Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Can we contain the COVID-19 outbreak with the same measures as for SARS? Estimation of the Transmission Risk of the 2019-nCoV and Its Implication for Public Health Interventions 10 notice issued in Heilongjiang province: those who conceal or falsely report their illness to hazard public security will be investigated for criminal responsibility There were 12 clusters of COVID-19 occurred in Hainan province Analysis of epidemiological characteristics of 43 COVID-19 family clustered cases A total of 195 clustered outbreaks reported in Jiangxi, 88.2% of which were family clusters