key: cord-0218589-g2fqmpfs authors: Liu, Kang; Yin, Ling; Xue, Jianzhang title: Impact of initial outbreak locations on transmission risk of infectious diseases in an intra-urban area date: 2022-03-23 journal: nan DOI: nan sha: a2c5fc878a8a49668be1ab99a313185345b7b6e5 doc_id: 218589 cord_uid: g2fqmpfs Infectious diseases usually originate from a specific location within a city. Due to the heterogenous distribution of population and public facilities, and the structural heterogeneity of human mobility network embedded in space, infectious diseases break out at different locations would cause different transmission risk and control difficulty. This study aims to investigate the impact of initial outbreak locations on the risk of spatiotemporal transmission and reveal the driving force behind high-risk outbreak locations. First, integrating mobile phone location data, we built a SLIR (susceptible-latent-infectious-removed)-based meta-population model to simulate the spreading process of an infectious disease (i.e., COVID-19) across fine-grained intra-urban regions (i.e., 649 communities of Shenzhen City, China). Based on the simulation model, we evaluated the transmission risk caused by different initial outbreak locations by proposing three indexes including the number of infected cases (CaseNum), the number of affected regions (RegionNum), and the spatial diffusion range (SpatialRange). Finally, we investigated the contribution of different influential factors to the transmission risk via machine learning models. Results indicates that different initial outbreak locations would cause similar CaseNum but different RegionNum and SpatialRange. To avoid the epidemic spread quickly to more regions, it is necessary to prevent epidemic breaking out in locations with high population-mobility flow density. While to avoid epidemic spread to larger spatial range, remote regions with long daily trip distance of residents need attention. Those findings can help understand the transmission risk and driving force of initial outbreak locations within cities and make precise prevention and control strategies in advance. Over the past decades, the outbreaks of chronic or emerging infectious diseases have posed a growing threat on public health, global economy, and human society (Bloom and Cadarette, 2019) . Many infectious diseases like COVID-19 are transmitted from individual to individual through physical contacts. Such transmission is often fostered and accelerated in cities with dense populations, well-developed transportation, and high-dynamic human mobility (Mao and Bian, 2010) . Reviewing the historical and current epidemics, it can be found that infectious diseases breaking out in cities usually originated from a specific location, and then spread to a larger spatial range. For example, the COVID-19 epidemic in Wuhan was initially The group-level models are mainly implemented by differential equations, which can be built for the whole study area, or for each of the sub-regions, respectively. For the latter type, the models of sub-regions would be coupled together based on human mobility between them. Such coupled models are called meta-population models, which have become one of the most popular model types at present. For instance, by integrating human mobility data collected by the Tencent Location-based Service, Li et al. (2020b) developed an SEIR (susceptible-exposed-infectious-removed)-based meta-population model to simulate the spatiotemporal dynamics of the infections in 375 Chinese cities. This study found that substantial mildly and asymptomatically communities in Shenzhen City based on mobile phone location data and point-ofinterest (POI) data. Then, we proposed three risk evaluation indexes (i.e., the number of infected cases, the number of affected regions, and the spatial diffusion range) to assess the transmission risk caused by different initial outbreak sites. Finally, taking the risk evaluation indexes as dependent variables, we applied random forest models to compare the contribution of potential influential factors (e.g., population and human mobility flow) to the transmission risk. Our study can not only help understand the transmission mechanism of infectious diseases, but also assist governments to develop forward-looking and precise prevention and control strategies. Shenzhen is one of the most developed metropolitan cities in China, which encloses a permanent population of 17.56 million 1 in an area of 1997.47 km 2 . Considering the high population density and human mobility dynamic, we selected Shenzhen as a typical city for the case study. Figure 1 shows the 649 communities of Shenzhen City. The spatiotemporal transmission process was simulated at such fine spatial scale, and each of the communities was served as an initial outbreak location in the case study. The mobile phone location data used in the study were provided by one of the three major telecommunication operators in China, China Unicom, which holds the location records of more than 400 million subscribers in China. An original location record includes the user's anonymous ID, and the timestamp and coordinate of the cell tower she/he was connected. Furthermore, we entrusted SmartSteps 2 , a company hold by China Unicom, to process the original mobile phone location data to several types of data, including residential population, working population, daily population flow, and stay-all-day population. Since China Unicom only covers part of the city population, SmartSteps made sample expansions according to their market shares in Shenzhen. The total population of the city after sample expansion is 16.61 million, which is close to the census data of 17.56 million. The definitions and processing methods of residential population, working population, daily population flow, and stay-all-day population are described as follows. The residential community of an individual is defined as the community where the individual spent the longest time during the nighttime (21:00 to 07:00) over a month (i.e., October 2019 in our study case). Based on the definition, the residential population of each community can be acquired. To ensure that the individual is a long-stay resident instead of a short-stay visitor, we only counted individuals who had appeared in the residential community for more than two weeks (14 days) in the month. The working community of an individual is defined as the community where the individual spent the longest time during the daytime (09:00 to 17:00) over a month (i.e., October 2019 in our study case). Based on the definition, the working population of each community can be acquired. The daily population flow indicates home-based daily population mobility flow between communities. Figure 2 demonstrates the OD-based trips and home-based trips of an individual during a day. An OD-based trip is defined as a trip from one location (i.e., origin) to the next location (i.e., destination), while a home-based trip is defined by setting the home location as the origin, and each of the locations that the individual has visited during the day as the destination. On this basis, daily population flow is defined as the total number of home-based trips from one community to the other during a day. We further averaged the daily population flow data of 14 days (i.e., October 14 to 27, 2019) for the use of the case study. The stay-all-day residential population of a community was defined as the number of residents that did not leave in their residential communities during a day. Similarly, we averaged the stay-all-day population data of 14 days (i.e., October 14 to 27, 2019) for the use of the case study. POIs depict places where people gather and conduct their daily activities in a fine spatial and taxonomic granularity. In this study, we collected about 1.7-million POIs of Shenzhen City from the Open Platform of AutoNavi 3 , one of the largest web mapping, navigation, and location-based services providers in China. Each POI includes information of POI name, type, coordinate, and address. All the POIs were spatially joined with the communities by the ArcGIS software, and then we counted the number of POIs inside each community for the use of the case study. Taking COVID-19 as the typical infectious disease, we built a SLIR-based metapopulation model to simulate the spatiotemporal transmission across fine-grained intra-urban regions (i.e., the 649 communities in Shenzhen in our study case). Table 1 . Except for , all the parameter values were determined according to existing literature mentioned above. In addition, ℎ indicates the daily population flow from community to . If = , ℎ denotes the stay-all-day population of community . ℎ was further normalized to ensure that ∑ ℎ = 1, ∀ . Considering that physical contacts and viral transmission between individuals usually occur in various activity places such as restaurants, cinemas, and markets, we assumed that communities with higher density of activity places would have larger effective infection rate. Therefore, the effective infection rate ( ) of community varies with its inside POI density and can be written as: where is a basic effective infection rate of the city. We used the following steps to estimate the only unknown parameter . 1) Given a specific value of , we obtained the epidemic curve of daily new infections (including symptomatic and asymptomatic cases) of the city by simulations using the model. 2) R0 of the epidemic curve was then estimated using the exponential-growth (EG) method implemented by the "R0 package" of R programming language (Obadia et al., 2012) . 3) Through the above two steps, we can obtain multiple pairs of ( , R0). By fitting the relationship of and R0, we can obtain the corresponding value by setting R0 as 2.5 (WHO and China, 2020). In our case, to generate the epidemic curves, 100 initial symptomatic cases were seeded in various communities approximately proportional to their population sizes. As shown in Figure 3 , the linear relationship between and R0 is well fitted with the coefficient of determination 2 up to 0.999. When R0=2.5, the value is equal to 0.405. To evaluate the transmission risk caused by different initial outbreak locations, we put 100 initial cases at each community, and simulated the spatiotemporal transmission process of COVID-19 using the built SLIR-based meta-population model. Similarly, 100 initial cases were seeded in various communities approximately proportional to their population sizes. We defined three indexes to evaluate the spatiotemporal transmission risk based on the simulation results and analyzed the transmission risk caused by different initial outbreak locations. We defined the following three indexes to evaluate the spatiotemporal transmission risk from different aspects. number of infected cases of the whole city in the -th day after the outbreak. Specifically, it indicates the number of regions (i.e., communities in our study case) that have appeared infected cases in the -th day after the outbreak. Specifically, it measures the geographical spatial range that has affected by infected cases in the -th day after the outbreak. As shown in Figure 4 , different initial outbreak locations may result in very different spatial diffusion range, even though they affected the same number of regions. To measure the spatial diffusion range, we first calculated the barycentric coordinates of the appeared infected cases across the city in the -th day after the outbreak: Then, we calculated the spatial diffusion range: By putting 100 initial infected cases (i.e., seeds) in each of the communities, respectively, we simulated the spatiotemporal transmission process of COVID-19 in Shenzhen and evaluated the transmission risk caused by different communities using the three indexes defined above. Figure 6 (1) and (2) imply that population concentrated and developed regions would result in higher transmission risk; the more specific driving factors behind those high-risk initial outbreak locations will be investigated in section 5. Figure 7 shows the transmission risk assessed by the index of "spatial diffusion range (SpatialRange)". It indicates that epidemic breaking out in remote locations may result in larger spatial diffusion range compared to that breaking out in geographically central locations. We suspected that this is related to the spatial heterogeneity of human travel behavior that residents living in central regions are more accessible to various places (e.g., workplaces and shopping malls), so they have shorter travel distances and less probability to bring the virus to distant places. On the contrary, residents living in remote regions usually need long-distance travel to satisfy their daily or specific needs, so they are more likely to transmit the disease to larger spatial range. The above analyses are mainly based on the risk maps shown in Figure 4 -6 as well as the authors' knowledge on Shenzhen City as residents. In section 5, we will further investigate the potential influential factors behind high-risk initial outbreak locations quantificationally. In this section, we established regression models to reveal the driving factors behind high-risk initial outbreak locations. The dependent variables of the models are the three indexes (i.e., CaseNum, RegionNum, and SpatialRange), respectively. While the independent variables include population related factors, population-flow related factors, and POI related factors. The detailed 16 independent variables are presented in Table 2 . total reduction of the model performance (i.e., criterion measured by mean squared error) brought by the variable. As for the experiments, we divided the 649 communities of Shenzhen City into a training set (75%) and a testing set (25%) and used 100 decision trees in the random forest regressor. Pearson correlation coefficient (r) and coefficient of determination (R 2 ) were used to evaluate the performance of the models. As for the transmission risk evaluated by the spatial diffusion range (SpatialRange), average travel distance of residents plays significant roles in different days, which implies that epidemic breaking out in locations with longer trip distance would spread to larger spatial range. Figure 7 also implies that those locations are usually suburban areas. Residents living in such regions usually need long-distance travel to satisfy their daily needs, so they are more likely to transmit the disease to larger spatial range. Therefore, to avoid the epidemic spreading to larger spatial range, the remote regions with long daily travel distance of residents also need attention. From Figure 8 , we can also find that the population density (including residential and working), POI quantity and density, and some centrality measures of mobility network (e.g., betweenness of weighted mobility network) do not show obvious influence on transmission risk compared to those mentioned above. Infectious diseases breaking out within a city usually originated from a specific location, and then spread to a larger spatial range. As the distributions of demographic, geographic, and socioeconomic elements have great spatial heterogeneity, infectious diseases breaking out at different locations would cause different levels of transmission risk. Therefore, it is necessary to systematically investigate the influence of initial outbreak locations on transmission risk of infectious diseases in a quantitative way. To achieve this goal, we firstly built a SLIR-based meta-population model to simulate the spatiotemporal spreading process of COVID-19, and then analyzed the transmission risk caused by each community of Shenzhen as initial outbreak location using three indexes including the number of infected cases (CaseNum), the number of affected regions (RegionNum), and the spatial diffusion range (SpatialRange). Finally, the relative impact of potential influential factors on the transmission risk was given by the random forest regressors. We have some findings based on the experiment results. (1) Different initial outbreak locations would cause similar CaseNum but different RegionNum and SpatialRange. (2) Initial outbreak locations with higher daily flow density would cause larger RegionNum. To avoid the epidemic spread quickly to more regions, it is necessary to prevent the epidemic breaking out in such locations. (3) Initial outbreak locations with longer daily trip distance of residents (those locations are usually suburbs of the city) would cause larger SpatialRange. Therefore, to avoid the epidemic spread to larger spatial range, suburban areas also require attention. Those findings can not only help understand the mechanism of transmission risk and driving force behind high-risk initial outbreak locations, but also help make precise prevention and control strategies in advance. The highlights or contributions of this study can be summarized as follows. (1) On the topic of infectious-disease transmission, most studies have focused on predicting the spatiotemporal transmission trends or evaluating the effectiveness of various NPIs. While our study investigated the role of initial outbreak locations, which is also important for understanding the transmission mechanism and beneficial for prevention and control. (2) Most infectious-disease transmission models have been built at large spatial scales such as countries, provinces, states, and cities. While our study delved into a fine-grained intra-urban scale and integrated the real human mobility data between intra-urban regions, which can depict the transmission process more precisely. (3) We proposed three indexes (i.e., the number of infected cases, the number of affected regions, and the spatial diffusion range) to evaluate the spatiotemporal transmission risk of infectious diseases from multiple perspectives, which can also be used in future studies and practice. (4) We systematically simulated, evaluated, compared, and analyzed the transmission risk caused by each of the intra-urban regions as initial outbreak locations, and revealed the influential factors behind high-risk locations, which is of great importance for understanding the transmission mechanism and guiding prevention and control. The main limitation of this study may be that we took only one city as the study case. If data of more cities can be obtained in the future, we would like to make experiments for multiple cities and test if those conclusions are tenable for different cities. Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19 Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic Infectious Disease Threats in the Twenty-First Century: Strengthening the Global Response Compliance and containment in social distancing: mathematical modeling of COVID-19 across townships Prevention and control measure of COVID-19 in China 063b0d2a6/files/4b92097245cb48a391ea4f6b40707ae5.pdf The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China Integrated vaccination and physical distancing interventions to prevent future COVID-19 waves in Chinese cities Clinical features of patients infected with 2019 novel coronavirus in Wuhan Staying at Home Is a Privilege: Evidence from Fine-Grained Mobile Phone Location Data in the United States during the COVID-19 Pandemic A taxonomy for agent-based models in human infectious disease epidemiology Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study The effect of human mobility and control measures on the COVID-19 epidemic in China Effect of non-pharmaceutical interventions to contain COVID-19 in China The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Classification and regression by randomForest Analytical methods and applications of spatial interactions in the era of big data Spatial-temporal transmission of influenza and its health risks in an urbanized area Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) The R0 package: a toolbox to estimate reproduction numbers for epidemic outbreaks Inference of start time of resurgent COVID-19 epidemic in Beijing with SEIR dynamics model and evaluation of control measure effect WHO & China. Report of the WHO-China Joint Mission on Coronavirus Disease Spatial considerations for the allocation of prepandemic influenza vaccination in the United States Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions A data driven agentbased model that recommends non-pharmaceutical interventions to suppress 20 Coronavirus disease 2019 resurgence in megacities Effects of human mobility restrictions on the spread of COVID-19 in Shenzhen, China: a modelling study using mobile phone data. The Lancet Digital Health This research is supported by the National Natural Science Foundation of China