key: cord-304925-9gvx3swf authors: Xie, Zhixiang; Qin, Yaochen; Li, Yang; Shen, Wei; Zheng, Zhicheng; Liu, Shirui title: Spatial and temporal differentiation of COVID-19 epidemic spread in mainland China and its influencing factors date: 2020-07-14 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2020.140929 sha: doc_id: 304925 cord_uid: 9gvx3swf Abstract This paper uses the exploratory spatial data analysis and the geodetector method to analyze the spatial and temporal differentiation characteristics and the influencing factors of the COVID-19 (corona virus disease 2019) epidemic spread in mainland China based on the cumulative confirmed cases, average temperature, and socio-economic data. The results show that: (1) the epidemic spread rapidly from January 24 to February 20, 2020, and the distribution of the epidemic areas tended to be stable over time. The epidemic spread rate in Hubei province, in its surrounding, and in some economically developed cities was higher, while that in western part of China and in remote areas of central and eastern China was lower. (2) The global and local spatial correlation characteristics of the epidemic distribution present a positive correlation. Specifically, the global spatial correlation characteristics experienced a change process from agglomeration to decentralization. The local spatial correlation characteristics were mainly composed of the‘high-high’ and ‘low-low’ clustering types, and the situation of the contiguous layout was very significant. (3) The population inflow from Wuhan and the strength of economic connection were the main factors affecting the epidemic spread, together with the population distribution, transport accessibility, average temperature, and medical facilities, which affected the epidemic spread to varying degrees. (4) The detection factors interacted mainly through the mutual enhancement and nonlinear enhancement, and their influence on the epidemic spread rate exceeded that of single factors. Besides, each detection factor has an interval range that is conducive to the epidemic spread. J o u r n a l P r e -p r o o f total of 116 countries or regions in the world appeared to be hit by the COVID-19 epidemic, and more than 130,000 people have been diagnosed. With the continuous spread of the COVID-19 epidemic, several countries or regions of the world have been forced to take emergency measures such as closing cities, stopping production, suspending school classes, and restricting population movement, causing great harm to economic development and residents' health (An and Jia, 2020) . Therefore, it has become an urgent scientific problem to grasp the spatial and temporal changes of the COVID-19 epidemic spread, and clarify the driving mechanism. Since the outbreak of COVID-19 epidemic, scholars have carried out abundant studies on the epidemic spread and achieved fruitful research results, which are of great guiding significance for the prevention and control of the epidemic. Joseph et al. (2020) estimated the size of the epidemic by using a mathematical model based on the data of confirmed cases of COVID-19 and residents' travel (including via trains, planes, and roads), and concluded that about 75,815 people were infected in the Wuhan city during the early outbreak stage of the epidemic. David et al. (2020) compared COVID-19 with other viruses, claiming that a sustained epidemic would pose a serious threat to global health, and proposing that the goal of sustainable development could be achieved by building a human-environment-animal health alliance. Liu et al. (2020a) used the exponential growth and maximum likelihood estimation method to determine the transmission dynamics of COVID-19 in Wuhan, and found that the average incubation period of the virus was 4.8 days, and the basic regeneration index reached 2.90 (95% Confidence Interval (CI): 2.32-3.63) and 2.92 (95% CI: 2.28-3.67). Ai et al. (2020) used the statistical analysis method to investigate the impact of lockdown measures in Wuhan (January 23, 2020) on the COVID-19 epidemic spread in other parts of China. They claim that if the closure J o u r n a l P r e -p r o o f measures were implemented 2 days in advance, it could have been possible to effectively prevent from being infected of 1,420 people, if the city was closed 2 days later, there would have been 1,462 more infections. Bai et al. (2020) used the transmission dynamics model to describe the evolution rule of the epidemic based on the data of confirmed COVID-19 cases in the Shaanxi province, revealed that the high incidence areas were mainly located in Xi 'an, Ankang, and Hanzhong, and that the outbreak peak period was in early February 2020, with the basic regeneration index of the epidemic spread reaching 2.95. Wang et al. (2020a) used the Spearman correlation analysis method to find the relationship between the incidence of COVID-19 and the Baidu migration index in Guangdong province, and found that there was a positive correlation between the daily incidence and the 3-day migration index. Wang et al. (2020b) used the complex network model to explore the impact of resuming work in surrounding cities on the epidemic situation in Hubei province on February 17, February 24, and March 2, and came to a conclusion that resuming work on March 2 would not cause a second outbreak of the epidemic. Yan et al. (2020) predicted the trend of the COVID-19 epidemic by building a time-delay dynamics model, and claimed that the epidemic could be controlled in the short period if the prevention and control efforts were kept unchanged. Chen and Cao (2020) made an epidemiological analysis of the daily confirmed cases in China, affirming that the situation of epidemic prevention and control in China was severe, and that targeted control measures should be formulated for the returning of enterprises and personnel in the future. Liu et al. (2020b) analyzed the spatial and temporal characteristics of the epidemic spread in Guangdong province, and found that the prevention and control measures adopted were effective, and high-risk areas were located in economically developed areas. Liu et al. (2020c) used the statistical analysis method to analyze the temporal and spatial J o u r n a l P r e -p r o o f characteristics and the transmission path of the COVID-19 epidemic in Zhuhai, revealing that the input from the epidemic area and family gatherings were the causes of epidemic spread. The research report published by the Yellow River Civilization (Ai et al., 2020; Wang et al., 2020b) . Although some scholars try to reveal the epidemic spread rules from a geographical perspective, they mainly focus on the spatial and temporal evolution characteristics of the epidemic, and seldom discuss the driving causes of the epidemic spread (Liu et al., 2020c) . (2) In terms of research methods, current studies employ mainly the correlation analyses and regression analyses method, while the application of modern information technology and spatial analysis method are relatively limited (Wang et al., 2020a) . (3) In terms of research scale, scholars generally investigate the epidemic spread characteristics at the city or regional scale, and there are few studies at the national level (Liu et al., 2020b; Liu et al., 2020b) . (4) In relation to data sources, the data of COVID-19 cases can be obtained very easily; however, there are great difficulties in obtaining environmental and socio-economic data related to the epidemic spread, which is why current researches lag behind in the driving J o u r n a l P r e -p r o o f mechanism of epidemic spread. In this paper, the number of confirmed COVID-19 cases in mainland China was taken as the measurement index, and the spatial and temporal differentiation of the epidemic spread were described by the exploratory spatial data analysis method. Then, the key factors affecting the COVID-19 epidemic spread were identified by using the geodetector method, so as to provide references for clarifying the epidemic spread rule, formulating some protection policies, and promoting the resumption of work and production. The basic research objects of this paper are the administrative units at prefecture- Baidu migration data in the paper is from January 11 to 23, 2020, specifically referring to the top 100 cities toward which people move out of Wuhan every day. The winter average temperature data for each unit are from the weather network (https://www.tianqi.com). In addition, since it is impossible to obtain data on the population, gross domestic product, and number of beds in medical institutions for each region during the COVID-19 epidemic period, this paper employs the corresponding data in 2018, which is derived from the 2019 provincial statistical yearbooks or the 2018 statistical bulletins. Using the cumulative number of COVID-19 cases as an indicator to measure the epidemic spread rate is biased due to the large differences in base population for different regions of mainland China. Therefore, the cumulative number of COVID-19 cases was divided by the number of days to calculate the epidemic spread rate, using the following formula: Where V i represents the epidemic spread rate in region i; S i represents the cumulative number of COVID-19 cases in region i by February 20; M represents February 20; and N i represents the date of the first confirmed case in region i. J o u r n a l P r e -p r o o f The exploratory spatial data analysis method was used to verify whether the observed value of a unit has spatial correlation with the observed values of its neighboring units (Li et al., 2018) . The global Moran's I index is used to measure the global spatial correlation, while the local Moran's I index in LISA (local indicators of spatial association) was used to measure the local spatial correlation (Rong et al., 2016) . Their formulas (Anselin, 1995; Gallo and Ertur, 2003) are as follows: Where I is the global Moran's I index; X i and X j are the observed values of unit i and j; W ij is the spatial weight matrix (with 1 as adjacent, and 0 as non-adjacent), S 2 represents the variance; K represents the number of observation units; I* is the local Moran's I index; W pq is the normalized form of the spatial weight matrix; and Z p , Z q are the normalized forms of the observed values in unit p and q. Spatial differentiation is a basic characteristic of geographical phenomena, the Geodetector method can measure the degree of spatial stratified heterogeneity and test its significance, through the within-strata variance less than the between strata variance . The Geodetector method comprises four modules: factor detection, interaction detection, risk detection and ecological detection. The factor detection is expressed by q value (Wang and Xu, 2017) , its formulas are as follows: Where q represents the explanatory power of detect factor X on the spatial distribution of detected factor Y, the value of q ranges from 0 to 1; h=1,…, L, which represents the stratification of the detect factor X and detected factor Y; N h and N are the number of samples for the layer h and the whole study area; ℎ 2 and 2 are the variance of Y value for the layer h and the whole study area; SSW and SST are the sum of intra-layer variances and the total variance of the whole study area. The interaction detection can identify the explanatory power of the detect factors X1 and X2 to the detected factor Y, whose operation steps are as follows: first, we calculate the q values of X1 and X2, respectively. Second, a new layer X1∩X2 can be obtained by stacking the layer X1 and X2, on this basis, the q (X1∩X2) value can be calculated. Third, the interaction type between X1 and X2 can be determined by comparing q (X1), q (X2), and q (X1∩X2) values. The risk detection is used to determine whether there exists a significant difference in the mean value of an attribute between the two sub-regions, which is tested by the t-statistic. Its formula is as follows : Where Y h represents the average value of epidemic spread rate in the layer h, N h is the number of samples in the layer h, Var represents the variance. We can compare whether there are significant differences in the influence of any detect factors X1 and X2 on the spatial distribution of the detected factor Y by using the ecological detection, which is measured by the F-statistic. Where 1 and 2 represent the sample sizes of the detect factors X1 and X2; 1 and 2 are the sum of the variances in the layers formed by X 1 and X 2 ; and L 1 and L 2 are the number of layers of X 1 and X 2 . The null hypothesis H 0 is: If H 0 is rejected at the significance level of , which indicates that X1 and X2 have significantly different effects on the spatial distribution of Y. The ArcGIS software was used to classify the cumulative number of COVID-19 cases in the following categories: 0; 1-50; 51-100; 101-300; and >300 persons. The epidemic spread rate was classified into the following categories: <1; 1-3; 3-5; 5-7; and >7 persons/day ( Figure 1 ). J o u r n a l P r e -p r o o f (1) Global spatial correlation characteristics In this paper, the cumulative number of confirmed COVID-19 cases and the epidemic spread rate were taken as variables, the spatial weight matrix based on geographical adjacency was selected, and the global Moran's I index, the P test value and the Z statistic score of the cumulative number of confirmed COVID-19 cases and the epidemic spread rate were calculated by using the GeoDa software, so as to clarify the global spatial correlation characteristics (Table 1) . respectively, passing the significance test at the 1% level, implying that the spatial pattern of the epidemic spread rate was also characterized by a clustering distribution. (2) Local spatial correlation characteristics The Global Moran's I index has the defect of ignoring the instability of local spatial processes. Therefore, it is necessary to draw a LISA cluster map to analyze the local spatial correlation characteristics of COVID-19 epidemic (Figure 2 ). cluster areas were located in Anqing, Lu'an, Jiujiang, Nanyang, Qianjiang, Shennongjia forest region, and Changde; and the range of 'low-low' cluster was basically consistent with that of February 6. In terms of quantity change, the number of units classified in the 'high-high' cluster first increased and then decreased; the number of units included in the 'high-low' cluster continued to decrease until they disappeared; that in the 'low-high' cluster experienced a process of initial decline and then rose again; and the number of units belonged to the 'low-low' cluster showed an increasing trend. Therefore, it is not difficult to see that the layout trend of the J o u r n a l P r e -p r o o f cumulative number of confirmed COVID-19 cases at the time nodes has not changed fundamentally, and was dominated by the 'high-high' and 'low-low' type. This indicates that the local spatial correlation characteristics of the confirmed COVID-19 cases were also dominated by a positive correlation, although the clustering trend was weakened. Overall, the 'high-high' cluster areas showed a layout trend from centralization to decentralization, which tended to be stable over time, especially for Wuhan and its surrounding areas. There was a contiguous layout trend of the 'lowlow' cluster areas, which were mainly located in Inner Mongolia, Gansu, Ningxia, Qinghai, Tibet, and Xinjiang. As for the epidemic spread rate, there were 16 administrative units in the 'high-high' cluster, 0 in the 'high-low' cluster, 5 in the 'low-high' cluster, and 75 in the 'low-low' cluster. The high-high cluster areas were located in Wuhan, Huangshi, Yichang, Xiangyang, Ezhou, Jingmen, Xiaogan, Jingzhou, Huanggang, Xianning, Suizhou, Xiantao, Qianjiang, Tianmen, Nanyang, and Xinyang; the 'low-high' cluster areas were located in Anqing, Lu'an, Jiujiang, Shennongjia forest region, and Changde; and the 'low-low' cluster areas were located in western China. The COVID-19 epidemic first occurred in Wuhan, and then spread to other parts of China. Therefore, people have been the carrier, the transportation network has been the channel, and the social and economic connections have been the internal driving force in the process of the epidemic spread. Thus, we selected the indicators reflecting the population distribution, population inflow from Wuhan, traffic accessibility, economic connection intensity, average temperature, and medical facilities conditions J o u r n a l P r e -p r o o f as the detection factors (Table 2) , and the epidemic spread rate as the detected factor to assess the formation mechanism for the spatial pattern of COVID-19 epidemic. Note: The gravity model was used to calculate the intensity of economic contact between each region and Wuhan, and the distance was the time reachable distance (Meng and Lu, 2011). Firstly, the classification method of natural discontinuities in ArcGIS10.2 software was used to divide detection factors into 6 categories, the classified maps of the detection factors were drawn (Figure 3 ). According to formulas (3)-(8), the determination ability of detection factors was calculated by using the geodetector software to analyze the influencing factors of epidemic spread. J o u r n a l P r e -p r o o f (1) Factor detection analysis The q values of all the detection factors passed the significance test at the 5% level, indicating that these factors have a significant determination ability of the spatial distribution of the COVID-19 epidemic spread. Specifically, the q (p) values of X1, X2, X3, X4, X5 and X6 were equal to 0.060 (0.003), 0.504 (0.000), 0.041 (0.000), 0.404 (0.000), 0.021 (0.028) and 0.078 (0.000), respectively. According to the size of q value, the inflow of population from Wuhan was the primary factor affecting the epidemic spread, and its explanatory power reached 50.4%. The economic connection intensity was the secondary determinant factor, and its explanatory power was 40.4%. The availability of medical facilities was the third determinant factor, which accounted for a 7.8% of explanation power. The determination ability of population distribution was 6%, while the traffic accessibility and average temperature were both relatively weak, below 5%. It is worth noting that the differentiation and factor detection analysis discussed only the determination ability of single factor on the epidemic spread rate, and did not consider the interaction effect of factors. (2) Interaction detection analysis The interaction detector analysis is used to identify the interactions between any J o u r n a l P r e -p r o o f two factors. Table 3 shows the interactions detection results between factors. It can be seen from Table 3 (3) Ecological detection analysis According to Table 4 , it could be found that the differences among the detection factors were statistically significant. Specifically, the influence of the population distribution (X1) on the spatial distribution of the epidemic spread rate was significantly different from the population inflow from Wuhan (X2), economic connection intensity (X4), and average temperature (X5), but not different from the traffic accessibility (X3) and medical facility conditions (X6). The influence of the population inflow from Wuhan (X2) was significantly different from that of the traffic accessibility (X3), economic connection intensity (X4), average temperature (X5), and medical facilities conditions (X6). There was a significant difference between the influence of traffic accessibility (X3) and that of economic connection intensity (X4), but there were no significant difference with the average temperature (X5) and medical facilities conditions (X6). The influence of economic connection intensity (X4) was different from that of the average temperature (X5) and medical facility condition (X6). There was no significant difference between the average temperature (X5) and the medical facilities conditions (X6). Generally speaking, the detection factors selected in this paper are reasonable, and the differences among them are statistically significant. Note: Y means the difference of the influence of the two factors is significant with the confidence of 95%, while N means no significant difference. (4) Risk detection analysis Table 5 showed that the epidemic had the fastest spread rate when the population density was 1,162-2,564 persons/km 2 . When the proportion of population inflow from Wuhan was maintained at 6.94-14.25%, the epidemic spread rate was fastest. When the economic contact intensity with Wuhan was kept in the range of 598,158.64-1,524,023.05, the epidemic spread rate was fastest. When the geographical distance from Wuhan was 68.38-540.98 km, the spread rate was fastest. When the average temperature in winter was maintained at 11-16°C, the epidemic spread rate was higher. The epidemic spread rate was higher when there were between 9.58 and 14.49 beds for 1,000 persons. It can also be found that the population distribution, population inflow from Wuhan, economic connection intensity, medical facilities, and the epidemic spread rate were significantly positively correlated, while the traffic accessibility was negatively correlated with the epidemic spread rate. This paper studied the spatial and temporal variation and the influencing factors of the COVID-19 epidemic spread in mainland China, which can provide references for formulating the public health policies and promoting the resumption of production. However, there exist the following problems. In terms of data sources, although many countries or regions have published the epidemic announcements of COVID-19 in real time, and the epidemic data was very convenient, virtually most of countries or regions had more people infected than registered, which could affect the accuracy of the evaluation results. Then, the population density data in 2018 was used as a replacement due to a fact that the population density for the each administrative unit in mainland China during the epidemic period was unavailable. The treatment method hid the drastic changes in the data because the COVID-19 epidemic happened during the Chinese Spring Festival period, which had a characteristic that the scale and frequency of population movements were intensified. It's worth noting that the population density is an important indicator to explain the epidemic spread rate, so the alternative data inevitably weakened the explanatory power of current research from this perspective. In addition, the factors affecting the epidemic spread were complex, and involved both the quantitative and the non-quantitative indicators. This paper constructed an indicator system of the multiple factors influencing the epidemic spread based on the principle of data availability; the other non-quantitative indicators might be ignored, which increased the uncertainty of evaluation results. For the research method, the formula of epidemic spread rate was applicable to compare the epidemic spread rate of different administrative units at three time nodes, which actually did not conform to the exponential growth rule of infectious diseases (such as the COVID-19, SARS, and MERS (Middle East Respiratory Syndrome)) in the exposed population. How to accurately measure the actual spread rate of the epidemic in each region was the direction of future research. Second, the exploratory spatial data analysis method investigated the spatial clustering characteristics of COVID-19 epidemic in administrative units at prefectural level and above, and did J o u r n a l P r e -p r o o f not consider the agglomeration development situation at a finer spatial scale, which inevitably weakened the application value of the research results. Third, the geodetector method was adopted to obtain the most favorable range of the COVID-19 epidemic spread in this paper, which was developed from the perspective of statistics. The sample data directly affected the final evaluation result, and no epidemiological investigation on the residents' health status was implemented, so the conclusions drawn from current research were uncertain to some extent. Finally, there might have multicollinearity between the strength of economic connection economic and other factors in this paper, and the geodetector method was not used to deal with it, which would weaken the persuasiveness of the research results. (1) The temporal changes of the COVID-19 epidemic in mainland China are clear, and the epidemic spread rate has an evident spatial variation. In terms of temporal change, the epidemic quickly spread to most regions from January 24 to February 6. The epidemic spread rate slowed down from February 6 to February 20, although the epidemic situation in some cities worsened sharply. The areas where the epidemic spread quickly were mainly located in the Hubei province, its surrounding areas, and some economically developed cities. The western part of China, as well as the remote areas of central and eastern China experienced a slow epidemic spread. (2) The global and local spatial correlation characteristics of the COVID-19 epidemic were dominated by clustering situations. Specifically, the global spatial correlation characteristics initially increased and then decreased, while the local spatial correlation characteristics tended to be stable with the passage of time, and were mainly composed of the 'high-high' and 'low-low' cluster types. The 'high-J o u r n a l P r e -p r o o f high' cluster areas were located in Wuhan, Huangshi, Yichang, Xiangyang, Ezhou, Jingmen, Xiaogan, Jingzhou, Huanggang, Xianning, Suizhou, Xiantao, Qianjiang, Tianmen, Nanyang and Xinyang. The 'low-low' cluster areas were located in parts of Inner Mongolia, Gansu, Ningxia, Qinghai, Tibet, and Xinjiang. (3) The population distribution, population inflow from Wuhan, traffic accessibility, economic connection intensity, average temperature and medical facilities conditions had significant effects on the epidemic spread rate. The population inflow from Wuhan was the primary factor affecting the epidemic spread, followed by the economic connection intensity, and the medical facilities conditions. The population distribution, traffic accessibility, and average temperature also had different degrees of influence on the epidemic spread. From the perspective of action direction, the population distribution, population inflow from Wuhan, economic connection intensity and medical facilities conditions played a positive role in the process of epidemic spread, while the traffic accessibility played a negative role. (4) Detection factors interacted through mutual enhancement and nonlinear enhancement, and their influence on the epidemic spread rate exceeded that of single detection factors. The interaction between the population inflow from Wuhan and medical facilities conditions, as well as that between the population distribution and population inflow from Wuhan, that between the population distribution and economic connection intensity, and that between the economic connection intensity and medical facilities conditions had a great influence on the epidemic spread. The interaction between the population distribution and traffic accessibility, as well as that between the population distribution and average temperature, that between the traffic accessibility and average temperature, and that between the average temperature and medical facilities conditions had little impact on the epidemic spread. influencing factors are analyzed. 2) The global and local spatial correlation characteristics of the epidemic distribution present a positive correlation. 3) The population inflow from Wuhan and strength of economic connection are the main factors affecting the epidemic spread. 4) The interaction influence of detection factors on the epidemic spread exceeds that of the single factor. 5) When the average temperature in winter is maintained at 11-16°C, the epidemic spread rate is higher. Population movement, city closure and spatial transmission of the 2019-nCoV infection in China Analysis of the economic impact of the NCP and countermeasure study Local indicators of spatial association-LISA Early transmission dynamics of novel coronavirus pneumonia epidemic in Shaanxi province Incidence trend of novel coronavirus (SARS-CoV-2)-infected pneumonia in China The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health-the latest 2019 novel coronavirus outbreak in Wuhan Exploratory spatial data analysis of the distribution of regional per capita GDP in Europe Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modeling study Evolution of patterns in the ratio of gender at birth in Henan province Transmission dynamics of 2019 novel coronavirus The diffusion characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in Guangdong province Analysis of the spatio-temporal characteristics and transmission path of COVID-19 cluster cases in Zhuhai Impact of high-speed railway on accessibility and economic linkage of cities along the railway in Henan province Spatial differentiation patterns of carbon emissions from residential energy consumption in small and medium-sized cities: A case study of Kaifeng Geodetector: Principle and prospective Preliminary analysis on the early epidemic and spatiotemporal distribution of new coronavirus pneumonia in Guangdong province When will be the resumption of work in Wuhan and its surrounding areas during COVID-19 epidemic? A data-driven network modeling analysis Determinants and identification of the northern boundary of China's tropical zone Modeling and prediction for the trend of outbreak of NCP based on a time-delay dynamic system Quantifying the influence of nature and socioeconomic factors and their interactive impact on PM 2.5 pollution in China Backtracking transmission of COVID-19 in China based on big data source