key: cord-0059926-iq2gvyj1 authors: Topîrceanu, Alexandru title: Analyzing the Impact of Geo-Spatial Organization of Real-World Communities on Epidemic Spreading Dynamics date: 2020-11-25 journal: Complex Networks & Their Applications IX DOI: 10.1007/978-3-030-65347-7_29 sha: d04cc1c19c55cb9bc619bcca47b386255156d22e doc_id: 59926 cord_uid: iq2gvyj1 Models for complex epidemic spreading are an essential tool for predicting both local and global effects of epidemic outbreaks. The ongoing development of the COVID-19 pandemic has shown that many classic compartmental models, like SIR, SIS, SEIR considering homogeneous mixing of the population may lead to over-simplified estimations of outbreak duration, amplitude and dynamics (e.g., waves). The issue addressed in this paper focuses on the importance of considering the social organization into geo-spatially organized communities (i.e., the size, position, and density of cities, towns, settlements) which have a profound impact on shaping the dynamics of epidemics. We introduce a novel geo-spatial population model (GPM) which can be tailored to reproduce a similar heterogeneous individuals’ organization to that of real-world communities in chosen countries. We highlight the important differences between a homogeneous model and GPM in their capability to estimate epidemic outbreak dynamics (e.g., waves), duration and overall coverage using a dataset of the world’s nations. Results show that community size and density play an important role in the predictability and controllability of epidemics. Specifically, small and dense community systems can either remain completely isolated, or show rapid bursts of epidemic dynamics; larger systems lengthen the epidemic size and duration proportionally with their number of communities. Predicting the dynamics of epidemic outbreaks is an important step towards controlling and preventing the spread of infectious diseases. With the recent COVID-19 pandemic affecting most of the world's regions, a lot of scientific effort has been invested in the ability to understand, model and predict the dynamics of the SARS-CoV-2 virus [8] . Similar global efforts have also been registered in the past for the SARS, MERS, Ebola, and even the 1918 flu pandemic, all of which have helped public health officials prepare better for major outbreaks [10, 19, 20] . As a result, current strategies for controlling and eradicating diseases, including the COVID-19 pandemic, are fueled by consistent insights into the processes that drive, and have driven epidemics in the past [1, 16] . A predominant body of recent research has invested in extending, customtailoring, and augmenting standard mass-action mixing models into tools suitable for analyzing COVID-19 [2, 13, 18] . However, in most cases we find that these mathematical models assume homogeneous mixing of the population (i.e., each infected individual has a small chance of spreading infection to every susceptible individual in the population) [2] [3] [4] 28] . As such, "flattening the curve"-type solutions have been proposed to decrease the reproduction number R 0 , and thus dampen the peak of the daily infection ratio. Based on homogeneous mixing populations, several notable studies estimate the length and proportion of the current COVID-19 pandemic [2, 4, 13, 17, 18] . By contrast, most pathogens spread through contact networks, such that infection has a much higher probability of spreading to a more limited set of susceptible contacts [15] . Ultimately, governments' actions around the world are based on these scientific predictions, having immense social and economic impact [3] . Over the past decade, an increasing number of studies pertaining to network science have shown the importance of community structure when considering epidemic processes over networks [5, 6, 23, 26, 27] . In this sense, the heterogeneous organization of communities is not a novel concept in network science [24, 32] . Salathé et al. [23] show how community structure affects the dynamics of epidemics, with implications on how networks can be protected from large-scale epidemics. Ghalmane et al. [11] reach similar conclusions in the context of time evolving network nodes and edges. Shang et al. [26] show that overlapping communities and higher average degree accelerate spreading. In [5] it is shown that overlapping communities lead to a major infection prevalence and a peak of the spread velocity in the early stages of the emerging infection, as the authors Chen et al. use a power law model. Stegehuis et al. [27] suggest that community structure is an important determinant of the behavior of percolation processes on networks, as community structure can both enforce or inhibit spreading. With a slightly different approach, we find Chung et al. [7] who use a multiplex network to model heterogeneity in Singapore's population; thus, the authors are able to obtain real-world like epidemic dynamics. Mobility patterns represent an important ingredient for augmenting the realism of complex network models, in order to increase the predictability of epidemic dynamics. Sattenspiel et al. [25] incorporate five fixed patterns of mobility into a SIR model to explain a measles epidemic in the Carribean. Salathé et al. [24] study US contact networks and conclude that heterogeneity is important because it directly affects the basic reproductive number R 0 , and that it is realistic enough to assume (contact) homogeneity inside communities (e.g., high schools). Their observation supports the simplification of a community's network, namely from a complex network to a stochastic block model [14] . Finally, Watts et al. [32] introduce a synthetic hierarchical block model, capable of reproducing multiple epidemic waves, but without any correlation to real-world human settlement organization, or realistic distances between communities. In this paper, we address the issue of modeling mobile heterogeneous population systems, where the community structure is defined by actual real-world geo-spatial data (i.e., position and size of human settlements). The contributions of our study can be summarized as follows: -We introduce the geo-spatial population model (GPM) to investigate how the duration δ, size ξ and dynamics of an epidemic are quantified, comparing to a similar homogeneous mixing model and to real COVID-19 data [9] . Our research focus is more on the community structure and individual mobility rather than on the transmission model, such that we incorporate GPM into a classic SIR epidemic model to run numerical simulations. -We define the population system (e.g., a country) as a stochastic block model (SBM) where blocks (or communities) are modeled by real-world settlements from a chosen country. Their size and spatial positioning (latitude, longitude) are set by real-world data. -We further define original individual mobility patterns based on the population (size) and distance between any pair of communities. Intuitively, individuals are more likely to move to a larger and/or closer settlement, than to a smaller and/or distant one [22] . -We show that the number of settlements in the population system, as well as altering the settlements' density (leading to more compact, or more sparse geo-spatial organization of communities) can directly impact the outbreak duration δ and size ξ. We introduce the geo-spatial population model (GPM) as an adaption of the standard stochastic block model [14] , where each block, or community, is a human settlement s i (in a given country or region) characterized by its global position (longitude x(s i ), latitude y(s i )) and number of inhabitants Ω i (i.e., number of individuals). In this paper, we introduce GPM as a means to model a country's population system, rather than that of the entire planet. Thus, the number and size of all settlements are defined by real-world data for a chosen country. Any individual n i a from any settlement s i is characterized by a stochastic mobility function. The probability p i a (s j ) of an individual n a from settlement s i to leave to another settlement s j is given by: where Ω j is the population |s j | of settlement s j , d ij is the Euclidean distance ((Δx 2 +Δy 2 ) 1/2 ) between the two settlements s i and s j , and ψ (psi) is a tunable parameter. While the Haversine formula is often used in distance calculations over the earth's surface, we consider that distances inside most countries (e.g., tens-hundreds of km) are not affected by the earth's imperfect spherical shape. Also note that the reference probability of an individual to remain within its settlement s i , for d ii = 0 becomes p i a (s i ) ∝ Ω i . In the current form of GPM, all individuals from the same settlement have the same probability for mobility (e.g., p i a (s j ) = p i b (s j ), ∀n a , n b ∈ s i ). As such, given all probabilities to move from one settlement to all other settlements s 0 , ... k in the population system, we express the normalized probability p i (summing up to 1, i.e., Σ k p i (s k ) = 1) by dividing each reference probability from Eq. 1 by the sum of all probabilities for all settlements: In practice, we find that the probability of an individual to leave its settlement is roughly 0-2% when the home settlement is moderately large (e.g., a city), and 0-50% when the home settlement is relatively small (e.g., a village or small town). Experimental assessment has shown a reliable value of ψ = 0.2 for the tunable parameter; nevertheless, the value of ψ could be customized, in a follow-up study, for each settlement using reliable real mobility data from specific countries [12] . An important original contribution of GPM is the fact that is defines stochastic blocks sized and positioned (in a 2D space) based on real geo-spatial data, rather than a synthetic hierarchical construct [32] . As such, we use data from the Global Rural-Urban Mapping Project (GRUMP v1), revision 01 (March 2017) curated by the Center for International Earth Science Information Network (CIESIN), Columbia University [31] . The Grump dataset contains 70, 630 entries in csv format on human settlements from around the world. The relevant information used by GPM to characterize a community is: country, latitude, longitude, population, name. The dataset is the result of an undergoing large -scale project, as, for example, we find only 24 settlements for Bosnia, totaling 1.1M inhabitants, while the real population is 3.3M (i.e., only 33% of data is available). On the other hand, for Romania, which is a larger country of 19M inhabitants, we find 864 settlements totaling 15.7M inhabitants (83% of data is available). Consequently, we filter out all countries with less than 50 settlements as we consider them incomplete population systems. Additionally, we filter out China, India and the USA because their larger sizes alter the results of the averagely sized countries, and thus deserve separate analysis. The resulting dataset consists of 96 countries (from American Samoa with 32K inhabitants, to Japan with 108M inhabitants). Figure 1 represents both a conceptual example of computing the GPM mobility probabilities based on position and populations size, as well as a real-world mapping over the Kingdom of Spain. In Fig. 1b , the modeled population is 33M inhabitants (70% of real size) placed in 735 settlements, all within a bounding area of 1000 km × 850 km (the Canary Islands have been omitted from the figure, but are included in the data model). In order to compare our numerical simulations with real epidemic data we use the most recent JHU CSSE COVID-19 dataset curated by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [9] . The comprehensive dataset contains time series information on daily total confirmed Coronavirus cases for the majority of countries (and some subregions) of the world. From this data we compute the number of new daily cases and show several important insights that are further investigated by our simulations using GPM. Figure 2a represents the histogram of the current COVID-19 outbreak size around the world. From the bimodal distribution we conclude that many countries are (still) weakly affected by the pandemic (e.g., ≤10,000 total cases), and another significant proportion are strongly affected (e.g., >100,000 cases). In between, there is a relative flat distribution of the outbreak size, similar to the occurrence of measles [32] . In Figs. 2b-d we provide three representative examples of real-world pandemic evolution for the first δ ≈ 200 days (starting January 22nd). Here we underline two important empirical observations: (i) the outbreak sizes (ξ < 1%) are much smaller than many early predictions based on homogeneous mixing, (ii) the dynamics are much less predictable, being characterized by multiple waves (w 1..3 ) which do not follow a single skewed Gaussian-like wave. Also, the pandemic duration is yet to be accurately inferred from the real data. The numerical simulations running GPM are quantified through the outbreak duration δ and size ξ. All simulations run for a fixed amount of t = 1000 iterations, ensuring a 3-year overview of the epidemic. The duration δ represents the number of days (discrete iterations t) from the epidemic onset (iteration t = 0) to the last registered new infection case. The size ξ represents the proportion of the total population being infected. Table 1 offers an overview of the numerical simulations' statistics on the Grump dataset (before and after filtering out countries with less than 50 settlements). In summary, our simulations do not trigger a pandemic in less than 20 of the smallest countries, a weak pandemic is characteristic to less than 30 countries, and about 20 countries exhibit a strong pandemic (i.e., in terms of size or duration). Looking at the persistent panel in Table 1 , we notice that, after leaving just the countries with more than 50 settlements in the dataset, the average duration δ increases, and the average size ξ drops. We believe this is explained by the high number of small-sized countries (101) in which the pandemic may be of shorter duration and higher impact. Furthermore, the top 14 countries (lowest panel) with the longest epidemic duration (δ > 270) take, on average, 446 days to overcome the simulated pandemic, and reach an average infection size of ξ = 0.65. To compare our heterogeneous mixing GPM with a standard homogeneously mixing model we provide Fig. 3 as an intuitive example. Figures 3a,b show the difference in outbreak size distribution using the same population size (i.e., Spain with 33M inhabitants). In Fig. 3c we extend the measurement to all countries, and the obtained ξ distribution is similar to that on the real COVID-19 data in Fig. 2a . That is, GPM enables realistic outbreak simulations of any size 0 ≤ ξ ≤ 1 and duration δ ≥ 0. In homogeneously mixing populations, the chances are that the outbreak is short and very strong, or completely nonexistent. For the homogeneously mixing scenario in Fig. 3d , a representative epidemic trajectory rises rapidly only once, infecting most of the population in the process (here, ξ = 100%). Conversely, in the examples with GPM (Fig. 3e,f) , epidemic trajectories exhibit unpredictable rebound, persist for different durations, and may infect different fractions of the population. The bimodality in the outbreak size distribution further motivates us to analyze the difference between smaller countries (i.e., defined as having less than 50 settlements in the Grump dataset), and larger countries (with more than 50 settlements). In terms of correlation (Pearson ρ) between actual country population and number of settlements in our dataset, we find a ρ = 0.557 for 'smaller' countries, and ρ = 0.953 for 'larger' countries. Figure 4a ,b plots the outbreak duration and size based on number of settlements for all 101 'smaller' countries (blue) and 96 'larger' countries. We gain two insightful observations: 1. A good distinction is possible between the two groups of countries in terms of outbreak duration (Fig. 4a) . Smaller countries' outbreak duration is shorter and more bounded, within δ < 212 (7 months, ρ = 0.472). For larger countries, duration increases as settlements increase (ρ = 0.668). 2. The same distinction is not possible in terms of outbreak size (Fig. 4b) . While larger countries continue to correlate with the number of settlements (ρ = 0.624), smaller countries' outbreak size is unpredictable (ρ = −0.269). Finally, we analyze the impact of settlements density (i.e., similar to controlling the spatial overlapping of communities) in a population system. Starting from the default density (=1) given by the actual geographical positioning of settlements, we increase and decrease the population density by three orders of magnitude (i.e., 0.001 to 1000) by contracting/expanding all settlements' positions proportionally. Figure 4c suggests that only for the default spacing (density ≈ 1) will the outbreak duration be maximized (average δ ≈ 302 measured over all countries). In other words, too dense or too sparse environments exhibit shortlived epidemics. To better understand what this means, we provide in Fig. 4d an overview of the outbreak sizes measured over all countries (average ξ = 0.56). As such, we find that sparse population systems (density < 1) trigger none or very small short epidemic bursts (e.g., there is not enough population to support the transmission dynamics). Conversely, denser systems (density > 1) also trigger short, but strong epidemics with high coverage. Averagely dense systems present much longer epidemic duration but with a possibly lower size ξ. Establishing realistic models for the geographic spread of epidemics is still underdeveloped compared to other areas of network modeling, such as online social networks [29] , models for the diffusion of information [30] , or network medicine modeling [21] . Nevertheless, one of the most distinct characteristics of many viral outbreaks is their spreading across geo-spatially organized human communities. In this paper we investigate the importance of spatially structured real-world community structures for predicting epidemic dynamics. The GPM model presented here provides one novel method that may prove useful in better binding complex networks and mathematical epidemics to the empirical patterns of infectious diseases spread across time and space. Our numerical simulations confirm that smaller scale environments (e.g., countries with fewer settlements) exhibit less predictable epidemic dynamics (in terms of outbreak size ξ), but as a general observation, the duration δ is noticeably shorter (within ≈200 days) than that of larger environments. Indeed, for larger environments, the outbreak duration and size increase linearly (ρ ≈ 0.62−0.67). In general, our results illustrate the qualitative point that epidemics, when they succeed, they occur on multiple scales, resulting in longer duration, repeated waves, and hard-to-predict size. A planned next step in our model is to include diverse isolation measures, under the form of mobility restrictions between settlements, and reduced infectiousness inside settlements (e.g., by wearing masks) and study their feasibility on limiting the infection size on a long term. Furthermore, there are several extensions to GPM worth investigating in future studies. For example, environmental factors associated with settlements location can have important effects on transmission risk, as they can vary greatly over short distances [25] . The model can also consider larger scale populations (e.g., continental) where mobility is given by international travel logs. In-between settlements, real data on mobility patterns can be used when available [12] . Finally, our mobility model may be further detailed to consider contact between individuals along the way to a target settlement (e.g., by car, bus, train) instead of direct transfer (e.g., plane). Taken together, we believe our model represents a timely contribution to better understanding and tackling the current COVID-19 pandemic that has proven hard to predict with many existing homogeneously mixing population models. Directly transmitted infections diseases: control by vaccination A mathematical model for the spatiotemporal epidemic spreading of covid19 What will be the economic impact of covid-19 in the us? Rough estimates of disease scenarios Social network-based distancing strategies to flatten the covid-19 curve in a post-lockdown world Epidemic spreading on networks with overlapping community structure On community structure in complex networks: challenges and opportunities Modelling singapore covid-19 pandemic with a seir multiplex network model Labs scramble to produce new coronavirus diagnostics An interactive web-based dashboard to track covid-19 in real time Modeling the SARS epidemic Centrality in complex networks with overlapping community structure Understanding urban human activity and mobility patterns using large-scale location-based data from online social media Feasibility of controlling covid-19 outbreaks by isolation of cases and contacts Stochastic blockmodels: first steps The implications of network structure for epidemic dynamics Modeling Infectious Diseases in Humans and Animals Interventions to mitigate early spread ofcovid-19 in Singapore: a modelling study Early dynamics of transmission and control of covid-19: a mathematical modelling study Transmission dynamics and control of severe acute respiratory syndrome Superspreading and the effect of individual variation on disease emergence Network science meets respiratory medicine for OSAS phenotyping and severity prediction Steps-an approach for human mobility modeling Dynamics and control of diseases in networks with community structure A high-resolution human contact network for infectious disease transmission A structured epidemic model incorporating geographic mobility among regions Epidemic spreading on complex networks with overlapping and non-overlapping community structure Epidemic spreading on complex networks with community structures The benefits and costs of using social distancing to flatten the curve for covid-19 Genetically optimized realistic social network topology inspired by facebook Tolerance-based interaction: a new model targeting opinion formation and diffusion in social networks Center for international earth science information network-ciesin-columbia university. gridded population of the world, version 4 (gpwv4). NASA socioeconomic data and applications center (sedac), Atlas of Environmental Risks Facing China Under Climate Change Multiscale, resurgent epidemics in a hierarchical metapopulation model