key: cord-0969074-6nend8se authors: Jin, Biao; Ji, Jianwan; Yang, Wuheng; Yao, Zhiqiang; Huang, Dandan; Xu, Chao title: Analysis on the spatio-temporal characteristics of COVID-19 in mainland China date: 2021-06-07 journal: Process Saf Environ Prot DOI: 10.1016/j.psep.2021.06.004 sha: c01ef1dcf9f53581cbab0f5bde96fad08ad57d29 doc_id: 969074 cord_uid: 6nend8se COVID-19 has brought many unfavorable effects on humankind and taken away many lives. Only by understanding it more profoundly and comprehensively can it be soundly defeated. This paper is dedicated to studying the spatial-temporal characteristics of the epidemic development at the provincial-level in mainland China and the civic-level in Hubei Province. Moreover, a correlation analysis on the possible factors that cause the spatial differences in the epidemic's degree is conducted. After completing these works, three different methods are adopted to fit the daily-change tendencies of the number of confirmed cases in mainland China and Hubei Province. The three methods are the Logical Growth Model (LGM), Polynomial fitting, and Fully Connected Neural Network (FCNN). The analysis results on the spatial-temporal differences and their influencing factors show that: (1) The Chinese government has contained the domestic epidemic in early March 2020, indicating that the number of newly diagnosed cases has almost zero increase since then. (2) Throughout the entire mainland of China, effective manual intervention measures such as community isolation and urban isolation have significantly weakened the influence of the subconscious factors that may impact the spatial differences of the epidemic. (3) The classification results based on the number of confirmed cases also prove the effectiveness of the isolation measures adopted by the governments at all levels in China from another aspect. It is reflected in the small monthly grade changes (even no change) in the provinces of mainland China and the cities in Hubei Province during the study period. Based on the experimental results of curve-fitting and considering the time cost and goodness of fit comprehensively, the Polynomial(Degree = 18) model is recommended in this paper for fitting the daily-change tendency of the number of confirmed cases. the possible factors that cause the spatial differences in the epidemic's degree is conducted. After completing these works, three different methods are adopted to fit the daily-change tendencies of the number of confirmed cases in mainland The coronavirus disease 2019 (COVID-19) has spread worldwide. The confirmed cases have successively appeared in more than 200 countries. COVID-19 affects people's daily lives and the social economy's operation and makes many people lose their lives. It is the common enemy of all humankind. As the 5 first country that reports COVID-19 to the United Nations and society, the Chinese government and its people have made significant contributions to the fight against COVID-19. The Chinese government has been announcing worldwide the number of confirmed cases, new cases, died cases, cured cases, and suspected cases, as well as the response measures it has taken, nearly in real- 10 time [1] . These measures enable people to know the development and change of COVID-19 in China and provide decision supports and experience references for other countries to cope with COVID-19. Also, due to the openness of the data, many researchers can carry out relevant researches on COVID-19. In this paper, the data about the number of confirmed cases in China are daily-change tendency of the number of confirmed cases is carried on. This paper aims to understand the spatio-temporal differences of the epidemic at both the provincial-level in mainland China and the civic-level in Hubei Province. It also proves to a certain extent that the epidemic prevention measures adopted by the governments at all levels in mainland China are effective. 25 Since the outbreak of COVID-19, researchers worldwide have been carrying out a lot of research works on it. These researches can be mainly divided into the following six categories: 1) to study the impact of COVID-19 on human physical and mental health from a biomedical perspective [2, 3, 4] ; 2) to study 30 the impact of COVID-19 on human production, life, and social and economic development from a sociological perspective [5, 6, 7, 8, 9] ; 3) to creatively propose new mathematical models or revise some existing models based on relevant data for predicting and analyzing the development of the epidemic in a specific area [10, 11, 12, 13, 14, 15, 16, 17, 18] ; 4) to analyze the spatial-temporal char-35 acteristics of the epidemic in a specific area [19, 20] ; 5) to explore related factors which may affect the development of the epidemic [21]; 6) to evaluate the effects of different epidemic prevention measures [22, 23] . In terms of the research purpose and content, the third, the fourth, and the fifth categories are more relevant to the work carried out in this paper. 40 To complement medical actions to contrast the spread of infections such as COVID-19, Vianello et. al [10] have carried out some significant works. They [11] aimed to demonstrate the effectiveness of using parameter regression methods to calibrate a SIRD model for COVID-19. The 55 effective reproduction number response to NPIs (non-pharmaceutical interventions ) is non-linear and variable in response rates, magnitude, and direction. During the experiments, they exploited the sophisticated parameter regression functionality of a commercial chemical engineering simulator with piecewise continuous integration, event and discontinuity management. Their main contribu-60 tion is developing a strategy for calibrating and validating a model rather than presenting a fully optimized model or attempting to predict the future course of the COVID-19 pandemic. Considering that the assumption of the classic rate law central to the SIR compartmental models is not always true, Mun and Geng [12] designed a modified mathematical model for non-first-order kinetics. it is possible to distinguish four infection stages of epidemics/pandemics: the starting stage (infection outbreak), the early stage (infection transmission), the mature stage (infection mitigation), and the final stage (infection extinction). By the time they published this literature, the Hubei province has been in the final stage, while South Korea has just entered the mature stage. They claimed 85 that each phase's kinetic parameters would be properly estimated once all the data and the related convergence paths are collected. Especially, the model is progressively improving the predictions every day to support all the countries affected by the SARS-CoV-2 pandemic to make decisions and organize supplies and human resources. Hu et. al [15] propose a dynamic growth rate model 90 to analyze the characteristics and trends of the global outbreak of COVID-19. The model is derived based on the ordinary differential equation for infectious diseases, and its generality was tested by using the epidemic data of in China. They utilize the model to predict the inflection points of countries facing serious outbreaks and forecast their future trends. Cao et. al [16] es-95 tablished a COVID-19 SEIR transmission dynamics model, which took transmission ability in the latent period into consideration. Based on the epidemic data of Hubei province from January 23, 2020, to February 24, 2020, they fitted the parameters of the newly established modified SEIR model. Mojjada et. al [17] commit to demonstrating the ability to predict the number of individuals 100 affected by the COVID-19 as a potential threat to human beings by Machine Learning (ML) modeling. Their work shows that the Linear Regression (LR) effectively predicts new corona cases, death numbers, and recovery. Yang et. al [18] use a modified susceptible-exposed-infected-removed (SEIR) epidemiological model that incorporates the domestic migration data before and after 105 January 23 and the most recent COVID-19 epidemiological data to predict the epidemic progression. Further, they corroborate their model prediction using a machine-learning artificial intelligence (AI) approach trained on the 2003 SARS coronavirus outbreak data. Lv et. al [19] use Crystal Ball and GIS software to explore the spatial and temporal characteristics of COVID-19 from January 25 The Natural Breaks method is adopted to conduct the classification work to discover and compare the distribution differences of the number of confirmed cases in different regions more intuitively. The Coefficient of Variation (CV) is used for evaluating the changes in the level of different regions in different 160 months. The Natural Breaks method [25] is a statistical classification method based on the numerical statistical distribution. It can maximize the differences among different classes. In Eq. (1) The CV is a statistic that measures the variation degree of each observation in the data. It has no dimensions, making it possible to compare the dispersion degree of two data sets objectively. Like range, standard deviation, and variance, CV is an absolute value reflecting the dispersion degree of data. The magnitude of its value is affected by the dispersion degree and the average level of the variable. Eq. (2) can be used to calculate the value of CV. LGM is often used to model data from population, biological population growth, economic indicators, and other fields. Unlike the exponential model, LGM will reduce the growth rate when it grows to a particular stage until it reaches a specific maximum value. In addition, it is widely used in complex 3). Suppose the polynomial obtained by fitting is Table 2 . If m > 2, the FCNN can be considered as a DNN (Deep Neural Networks). The nonlinear fitting capability of DNN is powerful and can fit almost any function. The goodness of fit refers to how well the regression line fits the observations. The statistic that measures the goodness of fit is the coefficient of determination (R 2 ∈ [0, 1])), according to Eq. (4). Where RSS is the abbreviation of 'Residual Sum of Squares' while T SS is This section first analyzes COVID-19's spatial-temporal characteristics in 250 China from January 16, 2020, to July 31, 2020, is conducted. Then, the possible impact indicators that may cause these spatial-temporal differences are explored. Finally, the fitting effects of the daily-change tendency of the number of confirmed cases obtained using the three kinds of methods are compared and evaluated. The most direct evidence for this conclusion is that the correlation coefficient of 265 the two change curves in Figure 3 (a) and Figure 3(b) is approximately 99.78%. The epidemic variations during the study period of this paper can be divided into three stages. This section analyzes the spatial differences among all the provinces in mainland China and all the cities of Hubei Province. The Natural Breaks method is adopted to conduct the classification based Table 3 . A smaller coefficient means minor volatility. According to the classification results, most provinces have less volatility in their grades, which is reflected in their small variation coefficients, and even 0. The reason for some provinces with relatively higher variation coefficients, such 315 as Shanxi Province, Ningxia Province, Gansu Province, and Inner Mongolia, mainly due to their confirmed number happen to be on the dividing line between the n-th level and the (n + 1)-th level. The classification results based on the number of confirmed cases of each 320 city in Hubei Province at the end of each month are shown in Figure 5 (a) to Since the number of cities in Hubei Province is small, the changes in their classification results can be displayed intuitively and clearly in the form of a picture. The classification results are directly presented in Figure 6 . Something that needs to be explained is that the ordinate values in Figure 6 correspond to 330 the six levels in Figure 5 . A smaller level number means fewer confirmed cases. As seen in Figure 6 , the classification results of each city have basically not changed during the study period of this paper. It proves to a certain extent the rationality and effectiveness of the centralized isolation, community isolation, and home isolation measures adopted by local governments at all levels. These 335 measures have effectively curbed the spread of the epidemic across regions. The following eighteen possible impact indicators (Table 1) Table 4 . From Table 4 it can be argued that: 1) At the provincial-level, the correla- To make this method be comparable with the LGM, the experiment in this section is devoted to obtaining the polynomial with its R 2 is approximated 385 to that of the LGM. The polynomials corresponding to the different highest coefficients are fitted, and the R 2 's values in each case are calculated. The calculation results are shown in Table 5 . The fitting effects are shown in Figure 8 . The fitting results can also explain to a certain extent that the neural network can fit any function theoretically. The three kinds of methods on the data about mainland China and Hubei Table 6 to Table 9 . J o u r n a l P r e -p r o o f Table 5 to Table 9 , the following conclusions can be drawn: i) A comprehensive comparison of Table 5 , Table 6 , and Table 8 shows that the LGM is better than the Polynomial models with Degree<11 in accuracy. ii) It can be concluded from Table 6 and Table 8 iii) A comprehensive comparison of Table 6, Table 7, Table 8 , and Table 9 shows that to achieve a similar accuracy with Polynomial(Degree = Polynomial-fitting, some solutions can be considered: 1) to add training data sample; 2) to introduce regularization; 3) to use cross-validation; 4) to make a more robust data regression using sigmoidal function and assign different weights to different steady-state points; 5) to evaluate the impact of polynomial fitting as a function of function order; indeed, oscillations are not feasible 455 once a stable condition is reached; 6) to refer to some other model calibration methods, such as [11] . In terms of the fitting method and the amount of experimental data adopted in this paper, introducing regularization is preferred. The so-called 'regularization' introduces L1-norm or L2-norm of the parameter vector into the original loss function. The L1-norm and L2-norm are denoted as 460 λ||w|| and λ 2 ||w|| 2 , respectively. Compared with the L1-norm, L2-norm is more popular. The new loss-function with introduced L2-norm can be described as The vector w is the coefficients of each term in the polynomial f . Then, the over-fitting issue can be improved by adjusting the value of λ. [28] provides an effective way to get an appropriate 465 value for λ. COVID has caused many adverse effects on human production, life, and health, and even threatened human life. It is challenging to predict the trend of the COVID-19 epidemic accurately: 1) People's understanding of this virus 470 is not comprehensive enough, and its variants continue to appear; 2) Although many prevention measures have been proven effective, it is difficult to evaluate the effectiveness of specific epidemic prevention measures quantitatively; 3) It is hard to achieve absolute isolation among individuals and among regions. In the battle against COVID-19, human beings are still in the passive defense stage. 475 However, it should be firmly believed that COVID-19 will be soundly defeated. Since many researchers have been carrying out a lot of works on it from different 35 J o u r n a l P r e -p r o o f perspectives. Their hard work and significant research achievements provide us with more and more professional knowledge, effective prevention measures (e.g., [22] ), and excellent mathematical analysis or prevention models (e.g., [10] ). The research results in this paper prove to a certain extent the effectiveness of the epidemic prevention measures adopted by the governments at all levels in mainland China. The measures are worth learning. It should be pointed out that it will be a more scientific and accurate way to collect the data about the relevant indicators in the same temporal interval with that about the number 485 of the confirmed cases in this paper. However, the data about the relevant indicators are not released in real-time on the official websites of corresponding departments in mainland China. Although many of these data are recorded in real-time or regularly, only their owners or public security organizations have the right to access them. As an alternative, this paper can only get them from 490 the statistical yearbooks. of the People's Republic of China, Make every effort to prevent and control covid-19 Covid-19 pandemic and its impact on mental health of healthcare professionals Impact of covid-19 pandemic on mental health in the general population: A systematic review The 510 impact of covid-19 on sexual health: A preliminary framework based on a qualitative study with clinical sexologists The impact of covid-19 on stock market performance in africa: A bayesian structural time series approach The impact of covid-19 on housing price: Evidence from china Impacts of covid-19 pandemic on user behaviors and environmental benefits of bike sharing: A big-data analysis The impact of covid-19 on the european football ecosystem? a delphi-based scenario analysis Impacts of covid-19 on energy demand and consumption: Challenges, lessons and emerging opportunities A perspective on early detection systems models for covid-19 spreading Covid-19: Mechanistic model calibration subject to active and varying non-pharmaceutical interventions An epidemic model for non-first-order transmission 540 kinetics Efficient artificial intelligence forecasting models for covid-19 outbreak in russia and brazil Analogies between sars-cov-2 infection dynamics and batch chemical reactor behavior A dynamic growth rate model and its application in global covid-19 epidemic analysis Study on the epidemic development of covid-555 19 in hubei province by a modified seir model Machine learning models for covid-19 future forecasting Modified seir and ai prediction of the epidemics trend 565 of covid-19 in china under public health interventions Research on the temporal and spatial characteristics of the covid-19 in hubei province with the use of grystal ball and gis Comparison of spatiotemporal transmission characteristics of covid-19 and its mitigation strategies in china and the us Correlation between local air temperature and the covid-19 pandemic in hubei, china First-wave covid-19 transmissibility and severity in china outside hubei after control measures, and 580 second-wave scenario planning: a modelling impact assessment Combined measures to control the covid-19 pandemic in wuhan, hubei, china: A narrative review The data model concept in statistical mapping Fitness of morbidity and discussion of epidemic characteristics of sars based on logistic models A simple model for a sars epidemic Strong robust generalized cross-validation for choosing the regularization parameter