key: cord-0740167-oatrjbo3 authors: Kang, Dayun; Choi, Hyunho; Kim, Jong-Hun; Choi, Jungsoon title: Spatial epidemic dynamics of the COVID-19 outbreak in China date: 2020-05-31 journal: International Journal of Infectious Diseases DOI: 10.1016/j.ijid.2020.03.076 sha: 4e6373f25e266b9d3797cdc3d981615969543d16 doc_id: 740167 cord_uid: oatrjbo3 Abstract Background On 31 December 2019 an outbreak of COVID-19 in Wuhan, China, was reported. The outbreak spread rapidly to other Chinese cities and multiple countries. This study described the spatio-temporal pattern and measured the spatial association of the early stages of the COVID-19 epidemic in mainland China from 16 January–06 February 2020. Methods This study explored the spatial epidemic dynamics of COVID-19 in mainland China. Moran’s I spatial statistic with various definitions of neighbours was used to conduct a test to determine whether a spatial association of the COVID-19 infections existed. Results The spatial spread of the COVID-19 pandemic in China was observed. The results showed that most of the models, except medical-care-based connection models, indicated a significant spatial association of COVID-19 infections from around 22 January 2020. Conclusions Spatial analysis is of great help in understanding the spread of infectious diseases, and spatial association was the key to the spatial spread during the early stages of the COVID-19 pandemic in mainland China. On 31 December 2019 the Chinese government first reported an outbreak of coronavirus disease in Wuhan, the capital of Hubei Province in China. The outbreak rapidly spread from Wuhan into all provinces of China and at least 24 countries. As of 06 February 2020 31,481 cases of COVID-19 were officially confirmed in mainland China, including 639 deaths. A total of 22,112 cases were confirmed in Hubei Province, accounting for 70.89% of the total cases. Until now, studies evaluating the spatial spread of the COVID-19 pandemic in China are limited. However, understanding the spatial spread of the COVID-19 outbreak is critical to predicting local outbreaks and developing public health policies during the early stages of COVID-19. Previous studies have described the spatial spread of severe acute respiratory syndrome (SARS) in Beijing and mainland China (Meng et al., 2005; Fang et al., 2009) . One study also considered the different types of connections between cities to calculate the spatial association (Meng et al., 2005) . Other studies have analyzed the epidemic data of the Middle East respiratory syndrome coronavirus (MERS-CoV) in Saudi Arabia using various spatial approaches (Adegboye et al., 2017; Lin et al., 2018; Al-Ahmadi et al., 2019) . This study investigated the spatial epidemic dynamics of the COVID-19 outbreak in mainland China. It also measured and compared the spatial association of the daily epidemic data. Different spatial connection assumptions between the provinces regarding possible pathways for the spread of COVID-19 (Meng et al., 2005) were considered. The objective was to provide spatial dynamic information about the spread of COVID-19 for infection prevention and control. The COVID-19 dataset was obtained from a Chinese website that provides real-time information on outbreaks of epidemic diseases (https://ncov.dxy.cn/ncovh5/view/pneumonia). The website updates data on newly confirmed cases in mainland China by province and date. There are 31 provinces in mainland China, and this study used 3 weeks' data from 16 January to 06 February 2020, which was during the early stages of COVID-19 in China. Data before 16 January 2020, the very early stage of COVID-19, was not examined because of data reliability concerns. Other datasetssuch as population, population density, number of licensed doctors, and hospital and health centre beds per 1000 inhabitants by provincewere acquired from a website (Statista, 2020). All population-related and medical resource datasets were collected in 2018; these were the most recent data that could be obtained. Figure 1 shows a map of cumulative cases by province. The number of cumulative cases is the sum of the newly confirmed cases from 16 January-06 February 2020. The largest number of cases was in Hubei Province, of which Wuhan is the capital city. Figure 2 presents the population and population density (populaiton/km 2 ) for each province in 2018. Guangdong and Shanghai had the highest population and population density, respectively. Hubei ranked ninth in population and thirteenth in population density. As shown in Figure 3 , Shandong has the highest number of doctors and hospital beds (per 1000 inhabitants), whereas Hubei ranks ninth and seventh for the number of doctors and hospital beds, respectively. Table 1 shows detailed information for each province. To show the spatial association of COVID-19, Moran's I statistic was used for each day with various types of neighbourhoods (Li and Calder, 2007) . Moran's I statistic measures the spatial autocorrelation and is calculated as follows: where i and j were the region indexes and W ij indicated the adjacency between area i and area j. This study considered different types of adjacency. Y i and Y j denoted the number of newly confirmed cases in areas i and j, respectively, and Y was the average of the number of newly confirmed cases in the entire region. A value of 0 indicated that there was no spatial autocorrelation in the data. A positive Moran's I value indicated the clustering of similar values, whereas a negative Moran's I value indicated the clustering of dissimilar values. The larger the absolute Moran's I value, the stronger the spatial autocorrelation. The number of cases in this study was skewed, so the spatial dependency may not have been properly captured. Therefore, to adjust for the skewness, logarithmic transformation of the newly confirmed cases was used instead of the number of cases itself. Because there were many 0 in the dataset, 0.01 was added to the data for log transformation. Similar to that in a previous study (Meng et al., 2005) , six different types of neighbourhoods were used. In Model 1, two provinces were considered adjacent if they shared a border. In Model 2, the distance between two provinces was used. In this case, the centroid for each province was determined using the gCentroid function in the rgeos package of statistical software R (Bivand et al., 2019) . Thined as the Euclidean distance between the centroids of these provinces. The extent to which the two provinces were adjacent was defined as the inverse of the distance. In Models 1 and 2, spatial adjacency was defined by geographical information, which is the usual method for examining spatial relationships. As COVID-19 is spread from person to person, population and population density were the key foci. Thus, Models 3 and 4 considered population and population density. The population (population density) for each province was ranked. A province was defined as adjacent to both the previous and following ranked provinces. Thus, the first-ranked and last-ranked provinces only had one adjacent neighbour. In terms of medical care resources, Models 5 and 6 considered the number of doctors and hospitals or medical centre beds. The definition of an adjacent neighbour was the same as that in Models 3 and 4. Moran's I function was used in the ape package of the statistical software R (Paradis et al., 2019) . The significance level of Moran's I test was 0.05. Figure 4 shows the time series plot of the newly confirmed cases for each day. The number of cases for each day was the sum of all cases in mainland China. As shown in Figure 4 , the number of cases increased almost exponentially. To prevent an exponential spread over mainland China, it was important to detect the spatial spread in the early stages. Because COVID-19 spread from Hubei Province, the epicentre of the outbreak, the number of newly confirmed cases in the provinces neighbouring Hubei was investigated. The provinces of Hunan, Sichuan and Tianjin were selected as representative areas of firstorder, second-order and third-order neighbouring provinces, respectively. The daily number of confirmed cases in Hubei is shown in the upper panel of Figure 5 . The lower panel of Figure 5 shows the daily number of confirmed cases in Hunan, Sichuan and Tianjin. From 22 January the number of newly confirmed cases in Hunan clearly increased. The infection first increased in Hubei and then in the first-order neighbouring provinces such as Hunan, and the second-order neighbouring provinces such as Sichuan. The infection finally spread to the third-order neighbouring provinces, including Tianjin. This supports the fact that COVID-19 spread spatially and that investigation of spatial dependency is very essential. This study examined whether a spatial association existed in the cases of COVID-19 in China. It used Moran's I statistic, a measure of spatial association, for the number of confirmed cases with different types of neighbourhoods. Figure 6 shows Moran's I statistic and its p-value for each day in Models 1-6. Overall, the p-values in Figure 6 are very close to a value of 0 in Models 1-5, except for the first few days. On 22 January Models 1-4 first detected a significant spatial dependency on the number of newly confirmed cases. Since approximately 24 January, the number of newly confirmed cases show significant spatial dependency in Models 1 and 2. The maximum value of Moran's I statistic in Models 1 and 2 is 0.4598 and 0.0841, respectively. The further the statistic is from 0, the stronger the spatial dependency. Therefore, the numbers 0.4598 and 0.0841 are significant, with p-values < 0.05. For population-related neighbourhoods, both Models 3 and 4 show a spatial clustering tendency since 22 January, except for 2 and 4 days, respectively. Among the days with significant spatial dependency, the maximum value is 0.6991 and 0.7336 in Models 3 and 4, respectively. Models 3 and 4 also have significant p-values < 0.05. For medical-care-based neighbourhoods, Model 5 shows a spatial association since 23 January except 2 days. However, no spatial association exists in Model 6. Since 23 January, the averagep-value is 0.0129 and 0.6638in Models 5 and 6, respectively, which show a significant difference. This study is the first to provide information on the spatial and temporal patterns of the COVID-19 pandemic in mainland China. In the early stage of the COVID-19 outbreak, new cases occurred intensively in the Hubei province. Over time, the cases spread to provinces neighbouring Hubei; the first-order neighbouring provinces showed a particularly increased number of confirmed cases after 22 January. Then, the second-order and third-order provinces showed a steeply increasing number of cases from 23 January and 24 January, respectively. This shows the spread of COVID-19. Eventually, the impact spread to all provinces in mainland China. This study investigated the spatial dependency through Moran's I with different types of spatial connections. Except for the medical centre bed-based neighbourhood, a spatial clustering tendency was observed in every neighbourhood type from approximately 22 January. The regions connected by express trains to Wuhansuch as Shenzhen, Shanghai and Beijinghad five, two and two confirmed cases, respectively on 21 January, the early stage of COVID-19 in China. This implies the possibility that COVID-19 had spread from Wuhan to other areas via the transportation (Zhao et al., 2020) . This possibility also supports the spatial dependency we detected in this study. On 23 January, the Chinese government closed off Wuhan City to prevent the spread of COVID-19. These findings could link with such a government policy. The results of the evaluation using geographical and distance-based neighbourhoods showed that COVID-19 is highly likely to spread between geographically adjacent regions. This may be because people in adjacent regions tend to interact with each other. In addition, Moran's I using population-based neighbourhoods also showed a strong spatial association. More people are likely to be infected with the virus in densely populated regions, which leads to the active spread of COVID-19 to other areas. Finally, having many doctors in a region indicates that the region can accommodate many severely ill patients, which can lead to the spread of the virus. This result is consistent with that of a previous study (Meng et al., 2005) . In addition, this study conducted the same spatial analysis using the ranks of the newly confirmed cases in a nonparametric approach because the data are quite skewed. The results were almost the same, except that there was a spatial association for a few more days. COVID-19 has been affecting countries worldwide, and the World Health Organization has declared the COVID-19 outbreak a public health emergency of international concern. This study demonstrated that in the early stages of the COVID-19 pandemic, the disease dramatically spread from region to region in mainland China. Examining the spatial spread in the early stages is very important to prevent further transmission. It is believed that this study is the first to investigate the virus's spatial spread to various types of neighbourhoods in mainland China. Although this study was conducted in the early stages of the COVID-19 outbreak to determine whether there was a spatial association, it did have a few limitations. First, it used the reported dataset for the daily number of newly confirmed cases in the 31 provinces of China. This did not include the number of suspected cases, so it was difficult to understand the spatio-temporal transmission of COVID-19. However, it was important to investigate the spatial and temporal characteristics of the COVID-19 outbreak at an early stage. Second, it considered six types of neighbourhoods; other types of neighbourhoods were not covered in this study, such as the urban-rural relationship, which might have also been significant (Meng et al., 2005) . Third, it only investigated spatial spread in mainland China. As infections have also occurred in other countries, investigating the global spatial spread of COVID-19 might be important to manage COVID-19. Future research, such as a study examining the spatial tendencies of the deaths and recoveries from COVID-19, will contribute to the control and prevention of this disease. Through such work, it will be able to be determined which factors affect death and recovery. Ethics approval and consent to participate:No human or animal samples were included in the research presented in this article; therefore, ethical approval was not necessary. Availability of data and materials:The datasets used and analysed during the current study are available from the websites https://ncov.dxy.cn/ncovh5/view/pneumonia and http://statista. com. Spatial modelling of contribution of individual level risk factors for mortality from Middle East respiratory syndrome coronavirus in the Arabian Peninsula Spatiotemporal clustering of Middle East respiratory syndrome coronavirus (MERS-CoV) incidence in Saudi Arabia Interface to Geometry Engine Geographical spread of SARS in mainland China Beyond Moran's I: testing for spatial dependence based on the spatial autoregressive model Modeling the spread of Middle East respiratory syndrome coronavirus in Saudi Arabia Understanding the spatial diffusion process of severe acute respiratory syndrome in Beijing Analyses of Phylogenetics and Evolution The association between domestic train transportation and novel coronavirus (2019-nCoV) outbreak in China from 2019 to 2020: a data-driven correlational report Competing interests: The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.Funding: This work was supported by the research fund of the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07047712) and by the Government-wide R&D Fund project for infectious disease research (GFID), Republic of Korea (grant number: HG18C0088).Authors' contributions: J.C. designed the study; D.K. and H.C. contributed to data acquisition; D.K., H.C. and J.C. carried out the statistical analysis; D.K., H.C., J.H.K., and J.C. drafted the manuscript. All authors contributed to the interpretation of data and revision of the manuscript. All authors read and approved the final manuscript. We thank all of the people who were struggling in the healthcare fields to overcome the COVID-19 outbreak. This study was performed under the research project named 'Research and Development on Integrated Surveillance System for Early Warning of Infectious Diseases (RISEWIDs).'