key: cord-325012-yjay3t38
authors: Chen, Ze-Liang; Zhang, Qi; Lu, Yi; Guo, Zhong-Min; Zhang, Xi; Zhang, Wen-Jun; Guo, Cheng; Liao, Cong-Hui; Li, Qian-Lin; Han, Xiao-Hu; Lu, Jia-Hai
title: Distribution of the COVID-19 epidemic and correlation with population emigration from Wuhan, China
date: 2020-02-28
journal: Chin Med J (Engl)
DOI: 10.1097/cm9.0000000000000782
sha: 
doc_id: 325012
cord_uid: yjay3t38

BACKGROUND: The ongoing new coronavirus pneumonia (Corona Virus Disease 2019, COVID-19) outbreak is spreading in China, but it has not yet reached its peak. Five million people emigrated from Wuhan before lockdown, potentially representing a source of virus infection. Determining case distribution and its correlation with population emigration from Wuhan in the early stage of the epidemic is of great importance for early warning and for the prevention of future outbreaks. METHODS: The official case report on the COVID-19 epidemic was collected as of January 30, 2020. Time and location information on COVID-19 cases was extracted and analyzed using ArcGIS and WinBUGS software. Data on population migration from Wuhan city and Hubei province were extracted from Baidu Qianxi, and their correlation with the number of cases was analyzed. RESULTS: The COVID-19 confirmed and death cases in Hubei province accounted for 59.91% (5806/9692) and 95.77% (204/213) of the total cases in China, respectively. Hot spot provinces included Sichuan and Yunnan, which are adjacent to Hubei. The time risk of Hubei province on the following day was 1.960 times that on the previous day. The number of cases in some cities was relatively low, but the time risk appeared to be continuously rising. The correlation coefficient between the provincial number of cases and emigration from Wuhan was up to 0.943. The lockdown of 17 cities in Hubei province and the implementation of nationwide control measures efficiently prevented an exponential growth in the number of cases. CONCLUSIONS: The population that emigrated from Wuhan was the main infection source in other cities and provinces. Some cities with a low number of cases showed a rapid increase in case load. Owing to the upcoming Spring Festival return wave, understanding the risk trends in different regions is crucial to ensure preparedness at both the individual and organization levels and to prevent new outbreaks.

Emerging infectious diseases are a major challenge in the 21st century. In recent years, worldwide outbreaks of Ebola and Middle East Respiratory Syndrome caused great health and economic losses. [1, 2] The ongoing new coronavirus pneumonia (Corona Virus Disease 2019, COVID-19) outbreak is becoming a global public health problem. The COVID-19 outbreak is highly similar to the severe acute respiratory syndrome (SARS) outbreak that occurred in 2003; both outbreaks were caused by new coronaviruses during time periods overlapping with the Chinese Spring Festival. [3] On December 31, 2019, the Wuhan Municipal Health Committee reported 27 cases of pneumonia with an unknown cause, and many cases were traced to the Wuhan Southern China Seafood Market, which was subsequently closed on January 1, 2020. [4] On January 7, 2020, laboratory tests showed that the pathogen causing the previously unexplained pneumonia was a new type of coronavirus; this pneumonia was then officially named COVID-19 by the World Health Organization. [5, 6] The COVID-19 outbreak started in Wuhan and spread rapidly to other provinces and countries. [7, 8] As of January 30, 2020, a total of 34 provinces and regions in China had reported 9692 cases, and nearly all imported cases were derived from Wuhan in Hubei province. [9, 10] COVID-19 has been defined as a class B infectious disease but has been managed as a class A infectious disease by the Chinese government. Daily case reports are being released, and any omission or concealment is punishable by law. Currently, the number of cases is still increasing, and the epidemic has not yet reached its peak; however, the situation differs from province to province. Information on the temporal and spatial distributions of cases is important for developing targeted treatment and prevention strategies. Because the return peak of Spring Festival travel is approaching, information on the possible changes in the incidence of COVID-19 in different cities will help in better preparation for disease prevention and management. Therefore, in this study, we investigated the temporal and spatial distributions of the early COVID-19 epidemic to reveal the dynamic changes and trends in reported cases. These results will provide valuable information for disease prevention at both the individual and organization levels.

All officially reported confirmed and suspected cases of COVID-19 and related deaths were collected from the official website of health departments or articles citing their reports. Case data were imported into Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) and analyzed.

The national and Hubei province shapefiles were used for ArcGIS (Environmental Systems Research Institute, Redlands, WA, USA) analysis. The map was linked to an Excel file containing time and location information. Location data were available for 34 provinces of China and 17 prefecture level cities of Hubei province. The time span was from January 16 to January 30, 2020. The COVID-19 risk analysis was based on the Bayesian space-time model of the WinBUGS (Microsoft Corporation) software. [11, 12] The model was divided into three levels:

(1) Data model

The statistical data on low incidence were assumed to follow a Poisson distribution for the parameters n i and m it : y it ∼ Poisson (n i m it ), where the Hubei province y it was i (1, ..., 17) cities with t (1, ..., 15) days number of cases occurring during the day, and the nationwide y it was the number of cases occurring in t (1, ..., 15) days in i (1, ..., 34) provinces. We assumed that there was no change in the number of people at risk in each city during the study period, such that n i was the number of people at risk in the town (i), and m it was the corresponding disease risk in the city (t) per day (i).

(2) Process model m it 's logarithmic transformation of disease risk allows the relative risk to be expressed as a linear combination of spatial, temporal, and spatiotemporal interaction components. The mathematical expression is shown in Equation (1).

where a is the fixed effect of the overall relative risk in the entire study area within 11 days, and t * = t -5.5 is the time span relative to the intermediate time point. In this model, the risk of disease is broken down into three parts: spatial change, temporal change, and space-time interaction; s i is a component of spatial variability, describing the urban disease risk relative to the risk in the entire study region over an 11-day observation period; b 0t * + y t is the change over time, which represents the overall trend of disease risk in the entire study area relative to that on the medium-term observation day, including the linear trend b 0t * and the time random effect y t ; b 0 is the time coefficient, representing the time trend in the study area; and b 1i t * allows each city to have different time-varying trends and is part of the spatiotemporal interaction. Relative to b 0 , it represents the trend of local change in each city based on b 0 ; e it is used to explain local changes that cannot be explained by spatiotemporal random effects. [13] (3) Parametric model

According to the Besag York and Molliè (BYM) model, [14] a spatial structure effect is defined by a prior conditional autoregressive (CAR) structure. In this process, a spatial adjacency weight matrix needs to be defined. If adjacent, the weight w ij = 1; otherwise the weight w ij = 0, and the special w ij = 0. Similarly, b 1i is also assumed to follow BYM characteristics. For the time structure effect y t , a CAR process is used, and the adjacency weight matrix in time is defined. For the over-discrete parameter e it , according to Gelman, the normal distribution with a mean value of 0 and a variance of s 2 e, is generally assumed and the variance of each parameter obeys Gamma (a, b). [15] Based on this model, through the spatial component s i and its posterior probability, high-or low-risk cities (identified based on the average risk [a] in the entire study area) can be identified. By calculating the probability that spatial relative risk exp(s i ) is greater than 1, regions can be divided into five categories: those with probability >0.8, 0.6-0.8, 0.4-0.6, 0.2-0.4, and <0.2 are defined as hot spots, secondary hot spots, warm-spots, subcold spots, and cold-spots, respectively. Similarly, based on the probability threshold, the differences in these regions can be identified considering the trend over time. Further, based on the probability that exp (b 1i ) is greater than 1, regions can be divided into five categories: cities with an incidence risk probability greater than 0.8 show a trend for a rapid change in risk relative to the overall change, and those with an incidence risk probability between 0.6 and 0.8 show a trend for a greater change in the incidence risk than the overall change. A value between 0.4 and 0.6 indicates that the change in the occurrence risk is the same as the overall risk change; 0.2 to 0.4, that the trend of change in disease risk is lower than the overall risk change; and less than 0.2, that the trend of change in disease risk is much lower than the overall risk.

Population migration data were collected from the Baidu website (http://qianxi.baidu.com/). Data on emigration from Wuhan city and Hubei province to other cities and provinces were extracted and edited with Microsoft Excel for Windows (Microsoft Corporation). Emigration intensity was calculated using the migration index multiplied by the migration proportion in the province or city. Correlation analysis was performed using IBM SPSS Statistics software (version 22; International Business Machines Corporation, Armonk, NY, USA). P values less than 0.05 were considered statistically significant. Pearson correlation coefficients greater than 0.2 were considered indicators of a positive correlation.

To obtain a general profile of the case distribution, we first analyzed all the available cases during this COVID-19 outbreak. [16] As shown in Figure 1A , the number of cases remained stable from January 11 to 15, 2020, and the number of new and cumulative cases increased rapidly after January 16. The first death was reported on January 10, and the number of deaths began to increase rapidly from January 17 onwards, with the cumulative number of deaths reaching 213 on January 30 [ Figure 1B ]. [6] After the nucleic acid assay became available, suspected cases waiting for laboratory confirmation could be diagnosed rapidly. [17] After January 19, the number of suspected cases increased rapidly, and about 40% to 50% of these suspected cases were then confirmed [ Figure 1C ]. Before January 19, the number of severe cases remained low, but they increased steadily from January 20 onwards [ Figure 1D ]. Because Wuhan is the capital city of Hubei province and the virus spread throughout the province quickly, we also analyzed the changes in number of cases in Hubei province. On January 9, 41 cases were first reported, and by January 30, 5806 cases had been reported, accounting for 59.91% (5806/9692) of the total cases in China [ Figure 1E ]. The cumulative number of Figure 1F ]. These data indicated that both the incidence and mortality of COVID-19 disease were the highest in Hubei province. [18] Before January 16, cases were mainly reported in Hubei province. From January 17 onwards, the outbreak spread to many provinces and the number of cases increased rapidly. Therefore, our spatial and temporal analyses used data from January 17 to 30, 2020. The location of each case was extracted from official reports and mapped onto the national map at the city level using ArcGIS. Of the 362 cities, 307 (84.8%, 307/362) had reported cases. In general, the core outbreak area, Wuhan, and its surrounding cities had the highest number of cases, followed by cities with a high population which are transportation hubs. Spatial distribution was then analyzed with a Bayesian model using WinBUGS. After nearly 100,000 iterations, the model converged successfully. After the model converged, it was iterated another 110,000 times to obtain parameter estimations. Generally, a ratio close to 1 indicates that the two chain iterative sequences are close, and that the model has a good convergence and is stable [ Figure 2A ]. Using the established model and parameters, hot and cold spots were identified. The results showed that Sichuan, Yunnan, Guizhou, Hainan, and Taiwan were hot spots, and Inner Mongolia, Gansu, Ningxia, Qinghai, Xinjiang, Chongqing, Hunan, and Guangxi were secondary hot spots. Generally, hot spots clustered in the midwest, and cold spots clustered in the southeast [ Figure 2B ].

The overall temporal trend was calculated using the time risk model (exp(b 0t * + v t )), which described the general incidence risk according to time between January 16 and 30, 2020. Through the analysis, b 0 was estimated to be 0.4604, that is, the disease risk on the following day was found to be approximately 1.585 times higher than that on the previous day. The relative risk according to time increased steadily from January 20 onwards and the upward trend continued as of January 30 [ Figure 2C ], indicating that the number of cases nationwide is on the rise. As shown in Figure 2D , Heilongjiang, Hebei, Beijing, Tianjin, Xinjiang, Ningxia, Jiangsu, Hunan, Taiwan, and Hainan showed a faster increase in the number of cases than was observed overall in the country. The increase in the number of cases in Jilin, Liaoning, Shaanxi, Guangxi, and Fujian provinces also occurred relatively fast [Supplementary Table 1 , http://links.lww.com/CM9/A210]. The increase in other provinces was consistent with or lower than the overall national trend [ Figure 2D ].

Since Hubei province had the highest number of cases, we analyzed the temporal and spatial distribution in different cities of Hubei province. Wuhan had the highest number of cases, followed by Huanggang and Xiaogan cities. Suizhou, Jingmen, and Xianning were part of the second group with a high number of cases. The spatial convergence analysis had 100,000 iterations [ Figure 3A ]. Hot spots were identified in the east regions and cold spots were identified in the west regions [ Figure 3B ]. The overall temporal trend in the change in the number of cases was calculated using the model. The average time trend coefficient b 0 was estimated to be 0.6727, indicating the time risk (occurrence probability in time) on the following day was 1.960 times higher than that on the previous day, suggesting that the daily number of cases in Hubei province is on the rise [ Figure 3C ]. Xiangyang, Suizhou, Yichang, and Ezhou showed the highest increase rates, and Shiyan, Shennongjia, Xiaogan, and Huangshi showed relatively high increase rates [ Figure 3D ]. Other cities had a growth slower than the overall growth in the province [Supplementary Table 2 , http://links.lww.com/CM9/ A210]. The increase rate in Hubei province (1.960) was higher than that in the whole country (1.585), indicating that the rate of increase in Hubei province was significantly higher than that in other provinces in China.

The outbreak started from Wuhan, and nearly all early cases were derived from this city, which is located in Hubei province. Because the outbreak occurred just before the Spring Festival, large-scale population migration during this period influenced the subsequent epidemic. From January 1 to 23, 2020, the population that migrated out of Wuhan city and Hubei province increased steadily, peaking on January 21 and 22 [ Figure 4A ]. Wuhan city was under lockdown on January 23, and after that, population migration was greatly inhibited. As observed in 2019, high population migration occurred on January 31; the timely city lockdown prevented a subsequent outbreak burst. We analyzed the migration into and out of Wuhan city and Hubei province. The top targets for emigration included Henan and Hunan provinces [ Figure 4B ]. More people migrated out of Wuhan than into the city [ Figure 4C ]. To analyze the correlation between the number of cases and the emigration in Wuhan city and Hubei province, population migration data were collected from Baidu Qianxi. The correlation coefficient between the provincial number of cases and emigration from Hubei province was 0.719 [ Figure 4D ]. The correlation coefficient between the provincial number of cases and emigration from Wuhan increased to 0.943, with the highest coefficient of 0.996 observed between Wuhan and other cities of Hubei provinces [ Figure 4E and 4F; Supplementary Tables 3 and 4 , http://links.lww.com/ CM9/A210]. These data strongly indicated that the number of cases was highly related to population emigration from Wuhan. Although we do not know the exact number of people emigrating from Wuhan, 5 million is an astonishing number, considering that each individual may be a potential virus carrier. If no control measures were implemented, the number of cases would exponentially increase. Of the 5 million emigrants, 74.22% emigrated to other cities of Hubei province [Supplementary Because the outbreak duration overlaps with the Spring Festival transport waves, large-scale migration will be a strong determinant of the characteristics of this outbreak. We analyzed the migration in the 3 days before the Spring cities with high immigration were relatively scattered. Chongqing experienced the highest immigration, accounting for 1.50% of the total number of immigrants [ Figure 4 ]. As immigrants will be traveling back to work after the Spring Festival, the cities showing high "emigration" may be at a high risk of another wave of new cases owing to the return of the migrants.

COVID-19 is causing great public health and economic losses in China. The number of cases has increased rapidly, with over 70% coming from Hubei province. [16, 19] As of January 30, the number of cases has exceeded the total number of cases of the SARS-CoV outbreak. [20] Until February 15, 2020, the cumulative number of confirmed cases was 70,533, nearly ten times that noted during the SARS outbreak. Prevention and control of the outbreak has required concerted action from the whole population of China. Although all individuals have participated in the campaign against the outbreak, people in areas with a low number of cases assumed that they were safe from the disease. Therefore, awareness of high-risk regions is important for preparing individuals, particularly in regions with low incidence. Further, it must be noted that 5 million persons emigrated from Wuhan to all over the country. [21] We do not know exactly how many of them are virus carriers, and it is impossible to track and diagnose them all. Evidence from previous cases showed that asymptomatic patients in the incubation period are also infectious, making it a greater challenge to track virus carriers. Therefore, isolation at home and less contact with others is the most efficient measure to prevent infection and transmission. To reduce transmission, the Spring Festival holiday has been extended from January 31 to February 2. The opening time for all schools and universities has been delayed, and online teaching programs have been launched. Factories have been required to delay resumption or allow work from home.

We analyzed the temporal and spatial distribution of reported cases. In general, the number of cases is still on the rise. For Hubei province, which has the highest number of cases and deaths, the growth trend is relatively stable. Conversely, in other hot spots, the number of cases was not very high, but the growth continued. Hence, these areas should be closely monitored. [22] It is particularly noteworthy that the cities with the fastest change in temporal risk, such as Chongqing, have large population movements and rapid temporal risk. If they are not strictly monitored, there may be more outbreaks. To prevent disease outbreaks caused by the return travel wave after the Spring Festival, the country has extended the Spring Festival holiday.

Correlation analysis showed that early incidence was closely related to the emigration waves from Wuhan, that is, the higher the migrating population index, the larger Chinese Medical Journal 2020;Vol(No) www.cmj.org was the number of cases. This also proved that the first generation of cases in each province mainly came from Wuhan. However, with the progress of the epidemic, migrants are spreading the virus to other people and are becoming an important source of local community transmission. Therefore, it is necessary to strictly implement isolation and related control measures in accordance with the guidelines. Particularly, control measures must be taken to prevent the spread of diseases in communities, which is crucial to prevent a large-scale outbreak.

Very soon, many company staff will return to their workplaces. Because many enterprises in China are labor intensive, with large populations, human-to-human transmission is extremely easy. Therefore, workers need to meet requirements for isolation after returning to the city and use personal protection at work to prevent clustered outbreaks. At present, there have been several reports of employee infections caused by resumption of work; these represent a warning for all enterprises. Super megacities such as Guangzhou, Shenzhen, and Shanghai, which have the largest number of migrant workers, need to be prepared for this.

From February 16, the number of new cases began to decrease, but the epidemic did not stop completely. Therefore, we must act together to stop the spread of the disease. At present, the state has adopted mobility control measures to encourage people to avoid going to public places and wear masks when going out to reduce the risk of human-to-human transmission. We believe that with the joint efforts made by everyone, the number of cases and losses will be kept to a minimum.

The challenge of emerging and re-emerging infectious diseases

Risks to healthcare workers with emerging diseases: lessons from MERS-CoV, Ebola, SARS, and avian flu

Bats are natural reservoirs of SARS-like coronaviruses

Origin of viruses: primordial replicators recruiting capsids from hosts

First case of 2019 novel coronavirus in the United States

A novel coronavirus from patients with pneumonia in China

Novel Wuhan (2019-nCoV) coronavirus

Drug treatment options for the 2019-new coronavirus (2019-nCoV)

Emerging understandings of 2019-nCoV

Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia

Space-time mixture modelling of public health data

Temporal and Spatial Analysis of Neural Tube Defects and Detection of Geographical Factors in Shanxi Province

Joint prior distributions for variance parameters in Bayesian analysis of normal hierarchical models

Bayesian image restoration, with two applications in spatial statistics

Interpreting posterior relative risk estimates in disease-mapping studies

The extent of transmission of novel coronavirus in Wuhan, China, 2020

Updated understanding of the outbreak of 2019 novel coronavirus (2019-nCoV) in Wuhan

Preparedness and proactive infection control measures against the emerging Wuhan coronavirus pneumonia in China

Real time data report of epidemic situation

Severe acute respiratory syndrome-associated coronavirus infection

Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak

Distribution of the COVID-19 epidemic and correlation with population emigration from Wuhan, China

The authors thank Andre Kiesel for critical revision of this manuscript. 

None.