key: cord-178791-cywjp5jh
authors: Chin, Wei Chien Benny; Bouffanais, Roland
title: Spatial super-spreaders and super-susceptibles in human movement networks
date: 2020-05-11
journal: nan
DOI: nan
sha: 
doc_id: 178791
cord_uid: cywjp5jh

As lockdowns and stay-at-home orders start to be lifted across the globe, governments are struggling to establish effective and practical guidelines to reopen their economies. In dense urban environments with people returning to work and public transportation resuming full capacity, enforcing strict social distancing measures will be extremely challenging, if not practically impossible. Governments are thus paying close attention to particular locations that may become the next cluster of disease spreading. Indeed, certain places, like some people, can be"super-spreaders."Is a bustling train station in a central business district more or less susceptible and vulnerable as compared to teeming bus interchanges in the suburbs? Here, we propose a quantitative and systematic framework to identify spatial super-spreaders and the novel concept of super-susceptibles, i.e. respectively, places most likely to contribute to disease spread or to people contracting it. Our proposed data-analytic framework is based on the daily-aggregated ridership data of public transport in Singapore. By constructing the directed and weighted human movement networks and integrating human flow intensity with two neighborhood diversity metrics, we are able to pinpoint super-spreader and super-susceptible locations. Our results reveal that most super-spreaders are also super-susceptibles and that counterintuitively, busy peripheral bus interchanges are riskier places than crowded central train stations. Our analysis is based on data from Singapore, but can be readily adapted and extended for any other major urban center. It therefore serves as a useful framework for devising targeted and cost-effective preventive measures for urban planning and epidemiological preparedness.

The ongoing outbreak of the infectious Coronavirus disease 2019 (Covid-19, also known as nCoV-2019 and caused by the pathogen SARS-CoV-2) is progressing worldwide with a reported number of cases surpassing 3 million 1 as of April 29, 2020 . The pathology of Covid-19 and its global spread remain a critical challenge to all worldwide 2, 3 . As of this writing, no approved treatment for Covid-19 has been identified and a vaccine is expected to be 12 months to 18 months away from being widely available. Based on our current medical knowledge, Covid-19 is more infectious than the 2003 Severe Acute Respiratory Syndrome (SARS is caused by SARS-CoV-1) 4, 5 , and with the main transmission pathway being through respiratory droplets, with infected patients experiencing an incubation period of maximum 14 days (possibly longer in some reported cases) before exhibiting a set of flu-like symptoms 6, 7 . The asymptomatic latent period of Covid-19 and its highly contagious nature have made the spread of Covid-19 extremely difficult to control and prevent 5 . The outbreak of Covid-19 started in December 2019 in the city of Wuhan, Hubei Province of China. Following the domestic outbreak in mainland China, the disease started spreading worldwide in January 2020 (or even as early as December 2020), leading to a declaration of Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO) 8 . Until this declaration of PHEIC, a total of 7, 818 cases were confirmed, in which 82 only were cases outside China 8 . In February 2020, Covid-19 continued spreading internationally, primarily in East and Southeast Asia, as well as some European countries having extensive air-travel routes to Wuhan and China. The first wave of international spreading took place during the critical period of the Chinese New Year holiday, during which China experiences the largest human migration every year 9 . Countries that were first hit by this outbreak include Thailand, Japan, Singapore, South Korea, France, Germany and the United Kingdom 10 . Those imported cases have quickly turned into local transmissions in most of these countries. In March 2020, as the outbreak reached an exponential growth in Italy, Spain, France, and Germany, the epicenter of Covid-19 moved to Europe 11 , which became the second wave of this outbreak and international pandemic. That second wave triggered a near-complete lockdown in most of the largest European countries. The purpose of these country-level or city-level lockdowns was to introduce and enforce strict social distancing measures, that were hoped to bring a fast reduction in the spread of imported and local community transmissions. The central assumption behind these drastic public-health measures was that restricting human movement is key to controlling the spread of Covid-19 in communities, between cities and countries. 'Physical distancing' has been coined as a better term to super-susceptible, it would require particular attention since it would pose the risk of simultaneously being a hotbed of infection and disease spreader. Identifying these places would therefore be critical in the fight with infectious diseases such as In this article, we report a study aimed at systematically identifying the spatial super-spreaders and spatial super-susceptibles in the spatial human network of the city-state of the Republic of Singapore. The particular choice of Singapore stems from it having: (1) been hit early in the first wave of infection directly from Wuhan and with a systematic tracking and mapping of infected people 46 , (2) one of the highest population densities in Southeast Asia, (3) a dense and highly interconnected human mobility and transportation networks 38, 47 , and (4) detailed and reliable data for the construction of spatial networks 48 . As mentioned earlier, a spatial super-spreader is a locus with a high outflow of people-i.e. a place where a lot of people are originated from and those people are moving to a high variety of places. In the same vein, a spatial super-susceptible is a destination for a large number of individuals originating from different places. Hence, this work proposes a systematic data-centric framework enabling the identification of spatial locations, which should be targeted by public health agencies in the event of an epidemic such as Covid-19. With these critical places identified, policy makers would then be able to implement cost-effective targeted responses with prevention and intervention measures directly connected to the level of vulnerability of a given location.

This section is divided into three parts: descriptions of (a) the study area, (b) the flow data, and (c) the metrics and indexes.

This study focuses on the public transportation flow network in Singapore. The city-state primarily occupies an island located in Southeast Asia with a total surface area of about 724.2 km 2 . As of 2019, the total population of Singapore is about 5.703 million people (with a population density of about 7, 875.68 per km 2 ), in which 70.6% are residents (citizens and permanent residents) and 29.4% are non-residents (foreigners with long-term passes). According to the General Household Survey 2015 49 , about 62.7% students and 64.1% working person relies on bus or rail transport services to travel to schools or work places, thereby making public transportation the primary mode of transportation in Singapore. As a result, the density of people using the public transports during the morning and evening peaks are high, and the distance between people at the stations and vehicles is short. Hence, a direct consequence of the high population density combined with a high rate of people using public transportation is that physical distancing is extremely challenging if not impossible during regular operations. This issue is a serious concern when facing the spreading of a highly contagious disease such as Covid-19.

To analyze the data, we consider the administrative subzone level spatial boundaries (from the Singapore Master Plan 2014 50 ) as the analysis unit. The residential population density (from the General Household Survey 2015 49 ) are shown in Fig. 1 . There are five regions (Central, West, North, North East, and East), with 55 planning areas, and 323 subzones 50 . Some of the subzones contain no residential population (white areas), which include airports and airbases (e.g. Changi Airport in the East Region) and industrial parks or ports (e.g. Jurong Island and Bukom at the south of the West Region, and Simpang North and South at the North Region). Although these places lack residential population, they are the workplaces (destinations) of a large number of individuals. The darker color areas indicate the home for a large number of people; in other words, a large number of journeys starting from and ending at these locations.

We used the origin-destination (OD) ridership data of bus and train to generate the public transport flow networks. The OD ridership data is systematically collected by the Singapore Land Transport Authority (LTA is a government statutory board under the Ministry of Transport) through API calls 48 . In this study, we used the ridership from November 2019 to January 2020. In terms of temporal resolution, the OD ridership data provides hourly passenger flows between each pair of bus stops or train stations (including mass rapid transit and light rail transit). The raw data are then aggregated into weekdays (a total of 21 days in November 2019, 22 days in December 2019 and 23 days in January 2020) or weekends (9 days in both November and December 2019 and 8 days in January).

As the raw data records the flow between OD pairs of bus stops or train stations, we spatially aggregate the data into flows between subzones, according to the bus stop or train station locations. A total of 303 subzones (out of a total of 323) contained at least one bus stop or one train station. These subzones then form the nodes (303 nodes) of the weighted direct network, with flows between nodes corresponding to the weight of directed edges. A total of 30, 331 edges were found, with a vast majority (30, 043 edges or 99%) being edges across subzones, and less than 1% (exactly 288 edges) were within-subzone flows (i.e. corresponding to self-loops from the network perspective). Given that very limited number of such intra-subzone flows, they were ignored in this study. 

To carry out this study, we introduce two indexes, namely the spreader index (SPI) and the susceptible index (SUI) to search for the spatial super-spreaders (SSP) and spatial super-susceptibles (SSS). Both indexes SUI and SPI are quantitatively determined and calculated using two key elements: (1) the local strength of human in-and outflows, and (2) the diversity of their respective neighborhoods 15 . The local strength of in-and outflows for a given location is the number of people coming to or leaving from the location, i.e. respectively the weighted in-degree and weighted out-degree of the corresponding node. The neighborhood diversity is captured and quantified by two types of concepts: (1) the diversity of zones and (2) the diversity of coreness. The diversity of zones 38, 51 refers to people that are coming from different parts of the city. As for the diversity of coreness 52, 53 , it refers to people either coming from the core or from the periphery of the country. More details about what constitutes core and periphery is given in Step 3 below. We applied this analysis framework to the Singapore public transport flow network, and identified the SSP and SSS using the SUI and SPI indexes. The population flow patterns are expected to be different for weekdays and weekends. Thus, the flow data were separated into weekday and weekend ones. The calculation flow of the spatial spreader and spatial susceptible indexes is detailed in Fig. 2 . The first part consists in aggregating the bus and train OD flow data to subzones as mentioned earlier. That top layer provides the main data for the calculation, i.e. two weighted and directed networks: weekday and weekend flow networks. These networks are subsequently used to compute three network characteristic measurements, including degree centrality (Step 1), community detection (Step 2), and k-shell decomposition (Step 3), which are described in full details in the following subsections. The degree centrality is used as a proxy for the intensity of the local out-and inflows, whereas the community detection and k-shell decomposition results enable the computation of neighborhood diversity, including zone-entropy and coreness-entropy as introduced below. Finally, in the last step (Step 4), the three network characteristics are used to calculate the SUI and SPI.

Step 1: Degree centrality

The degree centrality in this study includes both the non-weighted and weighted in-and out-degrees. The non-weighted and weighted versions of the degree centrality represent different concepts in terms of network characteristics. The non-weighted in-degree and out-degree are the number of links (or edges) that are pointed to and from a subzone, respectively. This non-weighted degree centrality measures the number of relationships that a particular subzone has. As for the weighted in-degree and out-degree, they correspond to the summation of incoming/outgoing flows for a given subzone, respectively. This weighted version of degree centrality indicates the total strength of a node in terms of gathering flows or spreading flows without accounting for the actual number of (incoming or outgoing) edges.

In this study, the weighted degree centrality is used to represent the local intensity of nodes for the calculation of the SUI and SPI. The weighted degree centrality is scaled within the unit interval. On the other hand, both non-weighted and weighted degree centralities are used in the weighted k-shell decomposition analysis performed as Step 3.

This study uses a community detection method (MapEquation algorithm 51 ) to identify the zones from the flow network, instead of using the administrative spatial boundaries (i.e. the boundaries of planning areas and regions as defined by the Singapore Government in its Master Plan 2014 50 ) that were designed and selected for governance and political purposes. The communities from this flow network analysis capture both the strength and direction of flows, which reflect the spatial activity of people derived from their daily commuting/mobility behaviors 38 . As the community distribution is identified for weekday and weekend networks, similarly the distribution should be differentiated between weekdays and weekends.

MapEquation is used to identify the communities in the flow networks 51 . This algorithm considers the direction and weight of edges to identify the strongly connected nodes in a directed and weighted network. This particular algorithm is different from modularity-based community detection methods since MapEquation's calculation concept emphasizes the strength of flows in community, i.e. higher flow intensities within a community than between communities (flows cycling within communities). MapEquation captures the effect of direction while ensuring large amount of flows are kept within the community. Moreover, the communities obtained with MapEquation are used as the zones that contain strong human flows cycle, which is quantified with the concept of zone-entropy. Note that to maintain the spatial continuous properties of the community, we integrate a distance decay effect 54 in the flow intensity calculation (see Eq. (1)) before running MapEquation:

where F(o, d) is the number of people moving from the origin subzone o to the destination subzone d, distance(o, d) is the distance between the two subzones, and F (o, d) is the actual flow intensity incorporating the distance decay effect.

First, we run the MapEquation algorithm on the two networks (weekdays & weekends), and identify the zone (set of communities Z = {Z 1 , Z 2 , ..., Z max } with Z j = {n| ∀ n belongs to community j}) in which each subzone (node) belongs to. Then, for each subzone, the incoming/outgoing neighbors' zones are retrieved from the results together with the weights of incoming/outgoing edges (w( j, i) or w(i, j)). The neighbors' zone information and flow weights are used to calculate the normalized entropy (H zone neigh (i)) using Eqs. (2)-(4). The entropy is normalized using the total number of zones in the network to enable a comparison between nodes. Note that the zone-entropy value ranges between 0 and 1 as a consequence of this

Step 3: Coreness-entropy

The k-shell decomposition is a method to label the coreness (k-shell levels) of nodes in a network based on the connectivity structure 14 . Because the edges of the flow networks were weighted, we use the weighted k-shell decomposition 52 , which is an extended version that consider both the number of links (degree) and the weights of links while labeling coreness. The coreness of a location indicates the position of the location in the range from periphery (low k-shell levels) to core (high k-shell levels). In a population flow network, the core locations indicate the common origins or destinations for a large number of passengers.

In this study, we first run the weighed k-shell decomposition using the non-weighted and weighted in-/out-degree (from Step 1) to calculate the in/out-k-shell levels for each subzone. Then, the k-shell levels are grouped into core (in-/out-core) or periphery (in/out-non-core) using the median value as a cutoff. Finally, for each node, its incoming/outgoing neighbors' core/non-core information is integrated with the flow weights to calculate the so-called coreness-entropy (H core neigh (i)) as defined in Eqs. (5)-(7). The entropy is normalized using the total number of coreness levels (binary levels here, i.e. C = {core, periphery}), to facilitate the comparison of the results between nodes. Note that the coreness-entropy value ranges between 0 and 1 after this normalization.

Step 4: Spatial spreader & susceptible indexes

The spatial spreader index (SPI) and spatial susceptible index (SUI) are base on the general concepts of the framework proposed by Fu et al. 15 . However, the exact indices are largely modified to account for the specificities of our study. Specifically, the SPI and SUI calculations are based on a geometric average of three key network metrics. The SPI (see Eq. (8)) is the geometric average of the local normalized weighted out-degree (NWOutDegree(i)), the zone-entropy of outgoing neighbors (H zone OutNeigh (i)), and the out-coreness-entropy of the outgoing neighbors (H core OutNeigh (i)). To understand this particular definition, one may for instance consider the case for which a node's SPI is high: this node has a high volume of outgoing flows (high local intensity), half of the flows are directed to the core area and the other half to the non-core area (periphery); these flows are equally divided into different zones (high out-neighbors' zone-entropy). In other words, a high SPI subzone has a large number of travelers originating from there, and these individuals are on their way to both core and periphery places, which are located in various zones. Therefore, with such a high SPI index value, the disease spreading would be facilitated within a short period of time. The flow intensity and diversity measurements are all normalized in the unit interval, and consequently the geometric average also varies between zero and one.

The spatial susceptible index SUI (see Eq. (9)) is constructed in a completely similar way as the SPI, with the exception that we are considering all incoming components as opposed to outgoing ones in the SPI: e.g. local normalized weighted in-degree (NWInDegree(i)), the zone-entropy of incoming neighbors (H zone InNeigh (i)), and the in-coreness-entropy of incoming neighbors (H core InNeigh (i)). Again, the concept associated with the SUI is better understood when considering a subzone with large incoming flows: half of the flows are coming from the core area and the other half from the non-core area, and these flows are equally coming from different zones. In other words, this subzone is a destination for a large number of travelers originating from various zones and their origins of movement contain both core and periphery areas. Therefore, a high SUI subzone is 6/19 expected to be a place where travelers would be more vulnerable and sensitive to being infected. Like the SPI, the SUI varies in the unit interval.

The spatial distribution of the non-weighted/weighted in-degree and out-degree for weekdays are shown in Fig 3. It appears that the patterns for the non-weighted and weighted in-degrees (top row) are similar to those of their out-degree counterparts (bottom row). This points to the fact that inflows and outflows are fairly balanced, which is expected for daily aggregated data associated with steady human movements. For the non-weighted degree measurements (left column), the high in-and out-degree subzones appear to be mainly concentrated at the East, North East and Central regions, whereas the West and North have a higher number of lower degree subzones. These results are correlated with the distribution of human density in Singapore, namely high to very high in the East, North East and Central regions, and lower in the West and North of the island. For weighted degree measurements (right column), the East region has higher degree subzones; the number of high degree subzones drop in the Central region; North, North East, and West regions have relatively more higher degree subzones when compared with their non-weighted counterparts. (d) ) refers to the out-degree. The townships are separated into four groups using the 25%, 50% and 75% percentile as breaks, thereby giving the "low", "mid-low", "mid-high" and "high" intensities.

The distribution of the non-weighted measurements for weekends are essentially the same as the results for weekdays. Figure 4 displays the differences in weighted in-and out-degree between weekdays and weekends. Most subzones are in the lightest green or purple colors, thereby indicating that their degree measurements are only very slightly larger than each other (the differences are less than 1.3 times). These subzones have a similar number of people using public transportation during weekdays and weekends. Only a few subzones are in dark colors indicating larger changes as compared to weekdays. These

subzones reveal a notably different usage of public transportation at these locations between weekdays and weekends; the changes of usage for weekdays are twice larger than weekends (dark purple), or the other way round (dark green). Figure 4 . Differences of weighted in-and out-degree between weekdays and weekends. Subzones in green indicate weekends have higher degree, whereas subzones in purple indicate weekdays having higher degrees. The color range from light to dark following the scale of higher degree.

As discussed in the Materials and Methods Section, a critical component of our network analysis is based on community detection. Figure 5 shows the spatial distribution of communities for both weekdays and weekends. The MapEquation algorithm with the provided data reveals 17 different communities for both weekday flow network and weekend flow network. Most communities are spatially continuous as the flow data is integrated with the inverse of the distance. However, some exceptions exist in both weekday and weekend communities (e.g. weekday and weekend community #2). The spatially-continuous patterns are expected given the spatial embedding of our networks and it indicates, as expected, that interactions between closer subzones are effectively stronger. On the other hand, the few spatially-split communities appear to be the by-product of a strong flow of human movement between two spatially-distant locations with sparser spaces between them. Although weekday communities and weekend ones are different-some are split and others have different boundariesoverall, they show some notable similarities (e.g. weekday community #11 and weekend community #10). This observation can be attributed to two particular features of Singapore: (1) given the limited available land, Singapore has a dense and compact urban landscape with a high level of mixed-use areas, be them residential, industrial and/or commercial, (2) a non-negligible fraction of the working population is active on Saturdays, which creates a high flow of travelers with the same commuting patterns as during weekdays. For instance, in the Western region, weekday communities #4 and #16 are extremely similar with weekend communities #3 and #17. These particular communities are fairly large with a heavy mixed-use of residential and industrial areas, where people have similar daily activities within a week. The North East Region (NER) contains three similar communities during weekdays and weekends (community #2 (upper part), #14, and part of #11 during weekdays, and the similar patterns of # 2 (upper part), #13, and #10 during weekends). The North Region (NR) is split into multiple communities (community # 2 (lower part), #7, #8, #10, #11, #15 during weekdays, and # 2 (lower part), #7, #9, #10, #15 during weekends). The identified communities #1, #2, #5, #6, #9, #12 during weekdays, and communities #1, #2, #5, #11, #16 during weekends are similar and fit well with the Central Region (CR), which is the central business district of Singapore. The community detection results show that the boundaries of human activity can be changed between weekdays and weekends. Community #4 in weekends appears to be an area resulting from the merger of communities #5, #17 and part of #8 during weekdays. This indicates that the area has stronger human movement interactions during weekends than weekdays, probably because the area is mostly residential with few shopping places providing daily needs products and necessities. In summary, the human movement boundaries are not fixed to a static pattern, and it is usually smaller than the shape of the known regional/administrative boundaries.

The spatial distribution of the core area is shown in Fig. 6 . As detailed in the Materials and Methods Section, the calculation of coreness is separated into two parts for each network, one of which uses the (weighted or unweighted) in-degree, and the other the (weighted or unweighted) out-degree. Hence, two sets of coreness results (outgoing core area and incoming core area) are obtained for each network. Some areas are identified as core in both incoming and outgoing directions (red subzones in Fig. 6 ), some are core for either incoming (pink subzones in Fig. 6 ) or outgoing (purple subzones in Fig. 6 ) but not both. However, the vast majority of areas are core ones from both the incoming and outgoing flows perspective. These red areas happen to have a notable overlap with residential areas with a high population density, thereby indicating that places where people live would always have high incoming and outgoing flow: a core area of human movement and commuting.

The calculation of spreader and susceptible indexes require access to the local normalized in-degree and out-degree centrality, as well as the incoming and outgoing neighborhood zone-entropy (Eqs. (2)-(4)) and coreness-entropy (Eqs. (5)- (7)). Note that these three key indicators (local weighted degree, zone-entropy and coreness-entropy) are in the unit interval, i.e. with variations between zero and one. Figure 7 shows the local out-and in-degree (left column), the outgoing and incoming neighborhood zone-entropy (central column) and coreness-entropy (right column) of the weekday (first two rows) and weekend (bottom two rows) flow networks. The spatial distribution shows notable differences between centrality, zone-entropy and coreness-entropy. Figure 6 . Distribution of core/non-core areas from the weighted k-shell decomposition. The coreness in (a) refers to weekday flow data, while in (b) it is for weekend flow data. Red-colored areas are for subzones identified as both incoming and outgoing core areas, purple-colored areas refer to solely outgoing core subzones, and pink-colored subzones highlight solely incoming core subzones.

In addition, high levels of local weighted out-and in-degree are mostly concentrated in the East, North East, and Central Regions. As for the zone-entropy, these high levels are primarily located in the North and Central Regions, while high levels of coreness-entropy are mostly found in subzones in the North Region. Essentially, most of the subzones have high levels of one, two or even three of these key indicators. However, only subzones with high levels of all three indicators are SSP or SSS. The distribution of the spreader index (SPI) and susceptible index (SUI) of each subzone for weekdays and weekends are shown in Fig. 8 . All four distributions suggest a similar Poisson-like type of distribution, with a mean value between 0.255 and 0.265 (solid vertical lines). The fact that these mean values are very close for both indexes on weekdays and weekends is in line with our previous comment related to an expected balance between incoming and outgoing flows of human movement. However, for our analysis the locations of interest are those that are outliers corresponding to large SPI and/or SUI values. Using the interquartile range (IQR) method, the outliers are identified as the subzones located above the Q 3 + 1. Fig. 8 as a reference level (dotted vertical lines) . The subzones that lay between Q 3 and Q 3 + 1.5 × IQR are categorized as secondary-spreaders or secondary-susceptibles.

This analysis reveals that a non-negligible number of locations exhibit large SPI and/or SUI values, thereby contributing to

our identification process of spatial super spreaders and spatial super susceptibles. 

The spatial distributions of super-spreaders (SSP) and super-susceptible (SSS) is shown in Fig. 9 for weekdays and in Fig. 10 for weekends. For weekday flow movement (see Fig. 9 ), 9 subzones are identified as SSP (red-colored zones in Fig. 9 (a)) corresponding to SPI ≥ Q 3 + 1.5 × IQR; 11 subzones are identified as SSS (red-colored zones in Fig. 9 (b)) corresponding to SUI ≥ Q 3 + 1.5 × IQR. It is worth noting that 9 subzones overlap in both figures, thereby corresponding to both spatial super-spreaders and super-susceptibles (subzones a) to i) in both figures, shown as red-colored subzones with a purple border). This indicates that most of the subzones with the highest SPI values would also have the highest SUI values, and vice versa. In Fig. 9(a) , all identified SSP are also identified as SSS. In Fig. 9(b) , two subzones-j) Khatib, and k) Tampines East-are identified as SSS only, with a lower SPI (Q 3 ≤ SPI < Q 3 + 1.5 × IQR). The weekend distributions exhibit slightly different patterns. There are 9 subzones identified as SSP on weekends, with 8 of them also being identified as SSP on weekdays (subzones a) to h) in Fig. 10(a) ); none of which are less than Q 3 in the previous figure. Similarly, all weekend SSS are either super-or secondary-susceptibles on weekdays, and vice versa. A total of 13 SSS are found with the weekend human movement network ( Fig. 10(b) ); 9 of them (subzones a) to i)) are also weekend SSP; 11 of which overlap with those of the weekday SSS results, the other two subzones-j) Boulevard, and k) Bukit Batok Central-are promoted from weekday secondary-susceptibles subzones (pink subzones in Fig. 9(b) ). This result further confirms that the SPI and SUI are not dramatically different between weekdays and weekends.

There are eight subzones (a) to i) except h) in Fig. 9 , and a) to h) in Fig. 10 Central), and three at the North Region (Sembawang Central, Woodlands Regional Centre and Yishun West) are identified as both SSP and SSS (in red) in both weekdays and weekends. During weekdays, most of the identified SSP or SSS areas belong to the regional core that contained a higher density of human activity. The eight SSP and SSS can be separated into two types. The first type consists of five subzones (a), c), e), f), and i) in Fig. 9 ), which contain high population density; the second type consists of the other three subzones (b) Jurong Gateway, d) Maritime Square, and h) Woodlands Regional Centre in Fig. 9 ) associated with a lower population density. The subzones in the first type are typical residential area, where the intensity of human activity are high due to the extensive need to travel out during the day time and travel back in the evening. On the other hand, the subzones in the second type are regional hubs of public transportation, which naturally attract a large population flows. public transport facilities along with numerous commercial buildings (shopping centers).

One counter-intuitive observation can be made from Fig. 9 and Fig. 10 : the CBD contains less SSP and SSS as one could expect. The CBD of Singapore is located at the southern central part of the Central Region. High intensity of human activity exists within the CBD area. As shown in Fig.7 , most of the subzones in the CBD have either a low weighted degree or a low neighborhood coreness-entropy. The low weighted degree probably finds its origin in the smallness of the area itself, which limits the catchment of incoming and outgoing flows. As for the low coreness-entropy, we trace it to the fact that a majority of the people are circulating within the CBD, which are mainly composed by the core area (Fig. 6 ). This result indicates that the CBD workplaces are less influential in terms of quickly spreading the disease to the rest of the city/island, but a contagious disease would quickly spread inside the CBD area as a consequence of its strong internal flows. In summary, the key influential areas are clearly identified as being the regional transport hubs, which connect the residential areas with the rest of the country.

The concept of super-spreader was originally introduced in the field of social network analysis to identify the most influential persons or nodes within a given social network. These persons could be opinion leaders, trend setters, public figures within a group of people 14, 55 . Furthermore, this concept of super-spreader individual has been borrowed by epidemiologists to identify and study the abnormally high spreading activity of a small group of individuals 13, 56 in large populations during an epidemic outbreak.

While previous studies focused on the identification of super-spreaders within a social network-nodes are individuals and edges represent the existence of interactions between two persons (binary edge)-this study focused instead on spatial networks of population flow with nodes representing physical locations and weighted/directed edges representing flows of human movement. This study sought to extend the concept of super-spreader to spatial interaction networks, with the objective of identifying possible spatial super-spreader locations-a set of locations that have the most influential effects in terms of disease spreading. The concept and calculation method were also reversed to uncover another group of critical locations: the most vulnerable places defined as super-susceptibles.

Our results based on large-scale data analytics show that most of the SSP are also SSS. This is reasonable and somehow expected given the nature of the daily population flow network. Specifically, since we are considering daily-aggregated data, the number of people who are leaving from a place can be expected to be of the same order as the number of people who are going to this place, i.e. we are in the presence of balanced commuting flows and the larger the outgoing flow intensity, the larger the incoming flow intensity. Based on the results, the places with intense flows have higher potential to be both SSP and SSS, and this is captured by the directed nature of the networks and the incorporation of the weighted in-degree or out-degree in our calculations. It is worth noting that Our results are in good agreement with previous studies based on the k-shell decomposition method: the core nodes of a social group tend to be, in general, the most influential ones 14, 22 .

Besides the local incoming and outgoing flow intensities, this study also considers two critical neighborhood diversities of these networks: the zone-entropy and coreness-entropy. The diversity of neighborhood is especially important while identifying multiple super-spreaders from a network 15, 28 . The zone-entropy is used to measure if the outgoing flows are directed towards more zones within the city-state. For instance, if the outgoing flows from a given place are converging to one zone only, this place can only affect one of the zones among all throughout Singapore, thus its influential power is clearly weak. Conversely, if human movement originating from one place flows to many zones across the country, its influential power is relatively high. In addition, coreness-entropy captures the diversity of flows to or from core or periphery areas. If the flows are all directed towards one of the periphery or core, its influential power is somehow limited to this particular type of areas. Conversely, if human movement flows to both core and periphery areas, this clearly indicates that whenever an outbreak happens at this place, it could quickly affect and spread to both core and periphery areas. These two diversity metrics complement one another and are combined in the calculation framework for differentiating places with high density of flows into strong and weak influential places (see Materials & Methods).

This study enables us to establish a list of subzones, which have a strong capability in terms of diseases spreading, as well as a list of subzones, which are more vulnerable in terms of being a place of high risk of contagion. In summary, the identified subzones are found to be mainly in the core area of residential and transportation hubs. These places have high population density and activity, such as transportation hubs or community hubs. Therefore, these places should be targeted by public health agencies, with higher resource allocations and disease monitoring aimed at prevention and intervention purposes. For example, public health agencies could consider these locations while planning to setup body temperature checkpoints, or to provide personal hygiene toolkits, or also setting up advertisements related to appropriate behaviors to counteract the ongoing epidemics. Moreover, since these locations are more vulnerable and more influential, they should get more attention while setting up differentiated policies such as the temporary closure of some businesses or restrictions on large-scale human activities as opposed to a blanket lockdown across the country.

The proposed network analysis framework rests upon the integration of the local flow intensity with neighborhood diversity measures-zone and coreness-to assess the effective spreading ability of particular locations. From the theoretical perspective, the proposed framework considers weighted and directed interactions between nodes (places) to identify super-spreaders and super-susceptibles. From the practical perspective, this study presents a quantitative and systematic framework to identify the key influential and vulnerable locations based on public transport flow data usually available by most transportation agencies in metropolitan areas.

It is worth noting that there are several limitations to this study. First, our analysis is limited to human flow associated with the use of public transportation, which is high in places like Singapore or other continental European cities but could be much lower in other urban areas with far less developed public transportation networks, such as in the United States for instance. In addition, our data only includes ridership of buses and trains and misses out on other important means of public transportation, including taxis, private-for-hire automobiles (cars, motorcycles, shuttle buses or vans), and active transportation (by walking, bicycle, skateboard, scooter, personal mobility devices, etc.). Some of the subzones currently do not have bus stops or train stations. However, as mentioned previously, public transportation by bus and train in Singapore is fairly high-more than 60% of daily commuting-thereby confirming the importance of the obtained results, as being representative of key human movement patterns.

Second, Singapore is an island country with its northern national border connected to Malaysia through two land checkpoints. Unfortunately, these cross-border flows are not included in this study. Many workers and students commute daily between Singapore and the state of Johor in Malaysia. There are some dedicated bus services directly connecting stations in Johor Bahru, Malaysia and various places across Singapore, including Woodlands at the North Region, Jurong East at the West Region, and Bugis at the Central Region, etc. Since these data were ignored, the in/out-flows of these places in Singapore are certainly underestimated.

Third, inter-mode trip transfers and bus transfers are not captured in the dataset used to carry out our study. Trip transfers between Mass Rapid Transit (MRT) lines are captured from the tap-in and tap-out records, i.e. passengers changing lines at some interchanges. But the OD data for buses only records the direct flow between bus stops, i.e. the records present only the tap-in and tap-out bus information, the records of the exchange of bus services are not shown/captured in the data. On the other hand, the data about changing from bus to train and vice versa is also unfortunately not available. Therefore, we can only capture direct bus services and this naturally limits the movement of travelers to the existing direct bus/train services.

Fourth, the short-time scale dynamics throughout a day is ignored. Indeed, we considered daily-aggregated data. However, a higher temporal resolution could be considered (say on an hourly basis), which could reveal different patterns of SSP and SSS. The temporal evolution of the SUI and SPI indexes would be the topic of a future study.

In summary, we have developed for the first time a framework allowing the identification of spatial super-spreader and super-susceptible locations. We believe that our results and analysis could be extended in two key directions. First, our analysis would benefit from being complemented by working with epidemiologists specialized in simulations of disease spreading through human contact networks. This would integrate our results with differential spreading across more or less vulnerable places. Specifically, the dynamic patterns of disease propagation could be observed from the simulation models, and thus the effects of the SSP and SSS could be quantified in terms of its actual contamination rate in the population. Second, the geography, demography, and social-economic of the spatial super-spreaders and super-susceptibles could be accounted for and included in our analysis using some statistical models, to identify the potential social and physical environmental factors that made these locations super-receivers and super-susceptibles.

In conclusion, it is well known that dealing with the reopening of economies and cities after a blanket lockdown requires a finely calibrated approach from governments. Although, here we used the Singapore public transport flow data to build these networks as a case study, similar analyses can readily be carried out using the exact same process in order to uncover the SSP and SSS in any large urban center. Our data-driven methodology, analysis and results offer an effective way of devising targeted and localized preventive measures when lifting stay-at-home orders. Such targeted measures for vulnerable locations are also critical in order to optimize government resources in the face of economic decline.

The datasets-generated from the Singapore LTA database 48 -used for this study are available from the following Spa-tial_Spreader_Susceptible_data repository: https://github.com/wcchin/Spatial_Spreader_Susceptible_ data.

Coronavirus disease 2019 (COVID-19) Situation Report 100

Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding

Clinical features of patients infected with 2019 novel coronavirus in Wuhan

Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China

Pattern of early human-to-human transmission of Wuhan

Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia

Novel Coronavirus(2019-nCoV) Situation Report 12

WHO. Novel Coronavirus(2019-nCoV) Situation Report 9

Coronavirus disease 2019 (COVID-19) Situation Report 59

Coronavirus disease 2019 (COVID-19) Situation Report 47

Coronavirus disease 2019 (COVID-19) Situation Report 40

Dynamical processes on complex networks

Epidemic Spreading in Scale-Free Networks

Identification of influential spreaders in complex networks

Identifying Super-Spreader Nodes in Complex Networks

Identifying multiple influential spreaders based on generalized closeness centrality

Super-spreaders in infectious diseases

Searching for Superspreaders: Identifying Epidemic Patterns Associated with Superspreading Events in Stochastic Models

The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak

Are the different layers of a social network conveying the same information?

Ranking the spreading influence in complex networks

Ranking spreaders by decomposing complex networks

A Novel Top-k Strategy for Influence Maximization in Complex Networks with Community Structure

Maximizing the Spread of Influence via Generalized Degree Discount

Identifying influential nodes in complex networks

Ranking the spreading ability of nodes in complex networks based on local structure

Identification of influential spreaders based on classified neighbors in real-world complex networks

Identifying influential nodes in complex networks with community structure

Understanding the Spatial Clustering of Severe Acute Respiratory Syndrome (SARS) in Hong Kong

Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations

Modeling the spatial spread of infectious diseases: The GLobal Epidemic and Mobility computational model

A geo-computational algorithm for exploring the structure of diffusion progression in time and space

Impact of Travel Between Patches for Spatial Spread of Disease

The Role of Human Movement in the Transmission of Vector-Borne Pathogens

A Metric of Influential Spreading during Contagion Dynamics through the Air Transportation Network

Ranking spaces for predicting human movement in an urban environment

Detecting the dynamics of urban structure through spatial network analysis

Geographically Modified PageRank Algorithms: Identifying the Spatial Concentration of Human Movement in a Geospatial Network

Modeling human mobility responses to the large-scale spreading of infectious diseases

Identifying Influential and Susceptible Members of Social Networks

Tracking Socioeconomic Vulnerability Using Network Analysis: Insights from an Avian Influenza Outbreak in an Ostrich Production Network

Vulnerability of the British swine industry to classical swine fever

Geographical and temporal distribution of the residual clusters of human leptospirosis in China

Hubs, authorities, and communities

Confirmed imported case of novel coronavirus infection in singapore; multi-ministry taskforce ramps up precautionary measures

Transportation and territorial development in the singapore extended metropolitan region

Republic of Singapore. Passenger volume by origin destination bus stops & passenger volume by origin destination train stations

General household survey

Master plan 2014 subzone boundary (no sea

The map equation. The Eur

A k -shell decomposition method for weighted networks

A model of Internet topology using k-shell decomposition

A computer movie simulating urban growth in the detroit region

Searching for superspreaders of information in real-world social media

The effect of superspreading on epidemic outbreak size distributions

This research was supported by an SUTD grant (Cities Sector: PIE-SGP-CTRS-1803).

W.C.B.C. conceived and conducted the experiment and the data analysis. W.C.B.C. and R.B. analyzed the results and wrote the manuscript. All authors reviewed the manuscript.

To include, in this order: Accession codes (where applicable); Competing interests The authors declare no competing interests.The corresponding author is responsible for submitting a competing interests statement on behalf of all authors of the paper. This statement must be included in the submitted article file.