key: cord-0150384-s9lhbett authors: Harper, Robert; Tee, Philip title: Balancing Capacity and Epidemic Spread in the Global Airline Network date: 2020-11-30 journal: nan DOI: nan sha: 4632cdb1f364fdd56b098cf26ef1300f31af6036 doc_id: 150384 cord_uid: s9lhbett The structure of complex networks has long been understood to play a role in transmission and spreading phenomena on a graph. This behavior is difficult to model analytically and is most often modeled numerically. Such networks form an important part of the structure of society, including transportation networks. As society fights to control the COVID-19 pandemic, an important question is to choose the optimum balance between the full opening of transport networks and the control of epidemic spread. In this paper we investigate how recent advances in analyzing network structure using information theory could inform decisions regarding the opening of such networks. By virtue of the richness of data available we focus upon the worldwide airline network, but these methods are in principle applicable to any transport network. We are able to demonstrate that it is possible to substantially open the airline network and have some degree of control on the spread of the virus. Since December 2019 the world has been adjusting to life with COVID-19 (officially SARS-CoV-2), with the first outbreak being reported in Wuhan in China [1] . The nature of the disease is understood to be a highly contagious viral infection, causing severe respiratory complications and high fatality rates. Societies worldwide are facing up to the first global pandemic since the 'Spanish Flu' outbreak of 1918; a so-called once in a century event. What is very evident from available data is that the epidemic will very sadly continue to claim lives until a cheap and simple treatment or vaccine is available. At the time of writing deaths exceed one million individuals [2] , and the 'second wave' is well underway as we head into the northern hemisphere winter. Public policy towards control of this disease has mostly focused upon social-distancing measures to break the chain of infection. This capitalizes upon the idea that COVID-19 is spread via (at least approximately) person to person contact, and by reducing social mixing the epidemic will be significantly slowed. This conclusion is based upon the well understood models of network epidemiology which in turn relies upon a contact graph [3, 4, 5] over which the network spreads. Interruption of this network of social contacts via a lockdown has serious economic consequences. In the initial phases of the pandemic the consequences of the lockdown included a drop 87.5% in airline traffic in China accompanied with a drop of 21.2% in retail sales [6] , and a precipitous contraction of 20.4% in the GDP of the United Kingdom [7] as an example. Although economies recovered moderately in the summer of 2020, economic activity is still well below normal. The purpose of this paper is to investigate how the two deleterious consequences of the pandemic may be balanced. On the one hand it is unrealistic for economies to remain locked down, and on the other it is vital to control the speed of the epidemic to minimize fatalities. At the heart of economies are transport networks, and the capacity of transport networks is an indicator of economic activity. The question that we pose concerns how the reduction of capacity in a transport network effects the spread of an epidemic on the same network, as modelled by as a site percolation model described in Section 3. We focus upon the airline transport network as detailed data is available on routes [8, 9] , capacity and timetables, although the methods and analysis are applicable to any transport network. It is interesting to note that certain Agent Based Models (ABM) of epidemic spread focus upon airports as the principle point of ingress of a disease [10] , and so controlling the spread through the airport network would significantly reduce community spread inside of geographically isolated nations such as Australia. Specifically, we are able to compare different schema for the reduction of capacity of the network on their effect of the rate of spread of the epidemic as measured by the effective reproduction rate of the epidemic, Re. We are careful to stress that this is an effective reproduction rate, not R0, as the primary purpose of our model is to experiment with network restriction, not provide robust predictions of epidemic spread. Nevertheless, we are able to show that there is a significant difference between random closure of airports and routes and a selective method that uses graph properties of the network to select airports and routes to close. In analyzing the schema for route limitation in the transport network we focus upon the network structure. It is well known that real world networks, particularly those possessing the scale-free property have non trivial behavior as links are progressively removed [11] . In particular, the random removal of links tends not to provoke functional failures of a network (as measured by the number of nodes that become unreachable), but targeted removal of hubs (nodes with a large degree or number of links) can provoke failure very quickly. It is this asymmetric behavior under link suppression that provides the motivation for our approach, as the connectivity of a network has a profound impact on the speed with which an epidemic can spread, as modeled by a random walk of infectious individuals upon it. In addition to investigating the removal of hubs of large node degree, we also explore the use of vertex entropy, an informational measure of node importance [12, 13, 14, 15] , as a schema for selecting route reduction by closing airports. Vertex Entropy is closely correlated with centrality measures, and does not in general focus on the hubs in a network, but instead the 'pinch points' in connectivity. We speculate that these nodes represent important pathways for spreading phenomena, but not necessarily high capacity routes in the network. The relationship between Re and network capacity is complex and non-linear. In the cases we examine, we show that for a drop in Re of 0.15, the capacity of some of the underlying networks can be 50% greater than for the random removal case. We believe this justifies the principle of partial network restriction for epidemic moderation. It is possible to refine further the methods used to model the spread of the disease in a much more granular fashion, but we believe these results form a motivation to produce these more detailed models and perhaps form the basis of an alternative approach to managing COVID-19 than repeated and complete closure of transport infrastructure. We begin in Section 2 with an overview of the necessary concepts of network epidemic models and vertex entropy in which to frame the experiments we undertake. We describe the simulation and experimentation in Section 3, outlining the construction of both our epidemic spreading model and also the route restriction methodology. In Section 4 we discuss the results obtained and we conclude in Section 5, including an outlook for further work. As was noted in the introduction, infections require a physical means of transferring from an infectious individual to a susceptible one. Some diseases, such as sexually transmitted ones, require physical contact, whereas airborne infections, believed to include COVID-19, only require proximity. Whatever the biological mechanisms at the core of epidemic spread is a network of contacts. A contact network represents individuals (or places) as nodes, and the links represent a contact along which transmission can occur. The disease then proceeds by transmission along the links usually governed by a transmission probability. The structure of the network has an important role on the progression of the network [16, 4] , and provides the starting point for the strategy we investigate in this paper. In particular, many real world networks are known to possess the 'small world' property [17] , involving the presence of hubs that create short cuts in the network and dramatically reduce the graph diameter. In principle, fewer network hops are required on average to reach a node, with obvious implications for epidemic spread. This property can be replicated with networks that are generated by the various forms of preferential attachment [18, 19] , which produce a scale free degree distribution. It is a well known claim that real world networks have a power law degree distribution whereby P (k) ∝ k −α , with values of α typically in the range 2.0-3.0, P (k) being the probability of a randomly chosen node having degree k. This has been much disputed [20] , and indeed contact networks may not be scale free [21] , but transport networks are hypothesized to exhibit reasonably strong scale freedom [22] . Our analysis of the data does not agree with this conclusion however. Using the approach outlined in [20] , we obtain a weak scale-freedom with a power law exponent α = 1.73, and a better fit for a truncated power-law. This result is intriguing, because although the airline network may not itself be scale-free, it does exhibit similar resilience behavior. In our work we adapt the percolation approach to modeling epidemic spread [23, 4] . The percolation model depends upon each link in a contact network permitting transmission of the disease according to a probability T called the transmissibility. Effectively, one starts with one infected node and then as a path of infection is traversed the link is marked as 'occupied' and the component of the graph connected by such links emerges as a cluster representing the infection. When this sub-graph captures a finite fraction of the nodes a Giant Component (GC) emerges. The transmissibility governs the size of the GC and is dependant upon the length of time an individual is infectious τ , and the rate β at which an individual infects one of its contacts. Providing β is independent on time, in terms of these parameters its value is T = 1 − e −βτ . As this probability varies from 0.0 to 1.0 the size of the GC does not vary smoothly, but in general transitions to a large fraction (say > 50%) at a distinct value of T . That is, the epidemic undergoes a phase transition [24] , and this critical behavior is a function of the control parameter T . Below a critical value Tc the size of the GC is very small and the epidemic not widespread. Above Tc the spread affects a finite fraction of the population. We describe in Section 3 how we apply transmissibilty and percolation to our simulation, but at coarse scale it is not meaningful, in a network carrying millions of passengers, to model every interaction between infectious and susceptible passengers. Instead, we use the concept of a contact probability of transmission to determine the number of infected passengers that exit a flight dependant upon the capacity of the aircraft. This is a key simplification (and vulnerability) of our model, and in future work we intend to expand our model to take account in a more granular fashion the person to person interactions of our infected individuals. This detail notwithstanding, it is clear that the value of β plays an important role whether the epidemic spreads at all on our toy model. We assume for the purposes of exploring the effectiveness of our route closure selection schema a value of β that will lead to an endemic spread of the infection. The concept of graph entropy was first introduced by Janos Körner in 1972 [25, 26] . Since then many approaches [27, 28, 29] have emerged to analyze and quantify the information encoded in the structure of a graph, in particular how the potentially vast configuration space of graphs that share common features (such as degree distribution) effectively 'hide' information and therefore have entropy. In essence, graph entropy measures the complexity of a graph and is neither easy nor efficient to compute. For example, the original definition of Körner relies upon the stable sets of a graph, a well known NP-complete problem. For practical purposes it would be ideal if an approximate, vertex level measure of entropy were available. One such approach, termed Vertex Entropy (VE), was primarily defined on unweighted, simple graphs [14] , based upon a formalism for vertex level entropy first introduced by Dehmer [12, 13] . Dehmer utilized the concept of a local functional for a vertex, which can be scoped to calculate values for every vertex based upon the local topology of the graph. The degree of locality in the treatment is controlled by using the concept of the j-sphere, S j , in the graph, centered at a given vertex. A j-sphere is a subset of vertices of distance j from a given vertex vi, where distance d(vi, vj) is the fewest number of edges in a walk from vi to vj. This definition excluded the vertex vi, and other interior nodes for j ≥ 1, but this introduces problematic zeroes when we introduce the clustering coefficient. Accordingly, in [14] , we extended the definition to include the vertex vi as part of the set. The definition of S j is then modified as follows. For a graph G(V, E), we define for a vertex vi ∈ V , the j-sphere centered on vi as The concept of j-spheres is a convenient formalism to capture locality in the graph and by breaking a large graph into j-spheres, we can progressively examine complex combinatorial quantities such as graph entropy on increasingly larger subsets of the graph. We proceed by equipping each S j i with a positive, real-valued function fi : vi ∈ S j i → R + . This function is proposed to be dependent upon properties of the nodes that are members of the j-sphere, such as their degree, number of cycles and so on, which capture the local structural properties of the graph. We can then construct a well defined probability function for each vertex, where |E| is the number of edges in the graph, recalling that i ki = 2|E|. The definition of VE was analyzed in [14] and termed Fractional Degree Entropy. The influence of a vertex within any graph is likely to be related to how it is connected into that graph, not simply by how many edges are incident upon it. To capture this local structure we consider the concept of vertex clustering first introduced in [17] as a normalizing factor in our definition of a vertex entropy, such that i represents a local clustering coefficient for the j-sphere of interest. In this equation however, we modify the normal clustering coefficient [17] to include edges incident on vi, and we define C j i in terms of the number of edges in a j-sphere edge set |E j i | as, And so we have Normalized Fractional Degree Entropy defined as, In the following section we introduce an extension to VE for a weighted graph G(V, E) on N = |V | vertices. We define the weighted adjacency matrix wij, such that the weight of an edge, eij ∈ E, is defined as wij, where wij = 0 if there is no edge between vi and vj, and wii = 0. As we are still dealing with undirected graphs, we can, instead of using the number of edges incident on a vertex, compute a weighted degree, Ki, such that To produce a value probability we need the sum of this weighted degree. As this sum will be proportional to 2|E| we can write, i Ki = W × 2|E|, where W is a constant. It suffices now to normalize the weights, by dividing by W , and to define our vertex functional p w i , as follows, p w i = K i W , with identically i p w i = 2|E|. We now use the the well defined probability pi = Ki/2|E| to define our Weighted Fractional Degree Entropy as: with its normalized counterpart given by: 3 Materials and Methods The airline route graph used in our analyses was constructed using flight data from [8] and airport data from [9] . Each entry within the flight database includes departure and arrival airports, airline and flight number, flight duration, plane capacity, code-share data, annual flight schedule etc. Best efforts were made to remove duplicate entries arising from code-sharing and multi-leg flights. We define a route as any pair of airports between which there is at least one flight. For each route we aggregate flight details from across the year to give metrics such as flight distance, flights per day, passengers per day, and flight capacity. More formally, we define our airline route graph as an undirected, weighted graph, G(V, E, wij), where V is the set of airports, E the set of routes, and wij the matrix of weights for individual routes such as its capacity in passengers. Using this weighted graph we simulate the spread of a disease by modelling a random walk of infectious individuals across the route network. As nodes are visited we mark the site as infected and continue our random walk, propagating the epidemic by using a coarse grained approach to infection using the concept of transmissibility. The coarse graining is obtained by categorizing the routes by the average capacity of flights operating between the airports. We assume that for each arriving flight containing one infectious individual the flight yields additional infectious individuals according to the capacity of the aircraft. This of course is not a realistic assumption for a true model of epidemic spread, not least because the serial interval, being the time between becoming infected and infecting another person is estimated to be 3-7 days for COVID-19 [30] . Our justification is that we are effectively 'compressing' time by making this assumption, and we are principally interested in the overall effect of network changes on the spreading phenomenon, not detailed predictions for the actual spread of the disease -that would require a much more detailed model such as an ABM [10] . We compute over averages of the runs it takes for 80% of the nodes to become affected and then historically compute the Re value of the infection. The computation of Re is undertaken by optimizing a fit for the early part of the infection using an exponential spread model and extracting the reproduction rate. Every random walk starts at a random node in the graph. The onward step at every stage of every walk is chosen using a probability weighted by the number of passengers travelling along each route from the current location. Intuitively, this describes the case of a passenger randomly picking a single ticket from the set of all tickets available across all flights for all destinations available from that airport. Our analyses consider five approaches to restricting the size of the network. The first regime is based upon random selection. A further four use targeted approaches based upon the local structure of the graph: node degree; VE, Eq. (4); weighted degree, Eq. (5); and weighted VE, Eq. (7). We use the number of passengers along each route as the weighting criteria. For random removal, and in order to give a more realistic simulation, we consider only those airports categorized as large by the 'ourairports' data [9] . The number of onward infections is chosen in a two-stage process. Our simulation requires discrete walkers. To facilitate this we map a nominal value for Re (in this case 1.5), onto a discrete probability density function and randomly choose values from that distribution. In order to introduce more realism into the transmission model we factor this value based on the average capacity of the planes serving the route. We use a factor of 2 for medium aircraft (150-300 passengers) and a factor of 3 for larger aircraft. We acknowledge the limitations of this approach. To cover the different node removal regimes we adopt two subtly different approaches to the random walks. The common elements of both are: nodes are removed from the graph until its size reduces by 20% from its original size; and each random walk terminates once 80% of the graph has become infected. For the targeted approaches we start by sorting all nodes in descending order of the value of the removal criteria i.e. degree, VE, weighted degree and weighted VE. For each node that is removed we conduct 100 random walks. The random removal regime is more complex as there are two sources of randomness: the random walk itself, and the choice of nodes used to restrict the network. We restrict the graph by randomly removing nodes, for each node that is removed we execute 50 random walks upon the resulting graph. This process is repeated 20 times. In practice, we do this by 'closing' airports and all associated routes. As we do this we can compute the passenger carrying capacity of the remaining network as measured by passenger-kms. Our experiment is to investigate the relationship between capacity and Re using these different schema, with the objective of identifying an optimum method of network restriction. In the following section we present the results of our simulations and highlight the key observations. Our primary objective is to study the impact of five different node removal regimes on epidemic propagation, specifically: random removal of nodes representing large airports and targeted removal based on the weighted and unweighted forms of degree and vertex entropy. All other free-parameters and simulation techniques such as the transmission mechanism, random walk stopping criteria, route selection etc. remained the same as described in Section 3. Fig. 1 shows the qualitative impact of two different node removal strategies. Fig. 1a shows the map of all airports and routes in the global network prior to any node removal. Each dot represents an airport and each line a route. Figs. 1b and 1c show the impact of reducing the size of the original network by 20% using degree and vertex entropy respectively. Red dots represent airports that have been removed from the network, while yellow dots indicate airports that have subsequently become disconnected from the core of the graph. In both removal scenarios the impact on global routes is profound with large reductions in open routes across the world. Most clear is the reduction in routes across Southern-Asia and in the trans-Atlantic regions. Looking more closely at Figs. 1b and 1c , it is interesting to note that the different removal regimes have impacted different geographical regions. The quantity of removed and disconnected airports is noticeably larger for degree-based removal in North America and Europe. On the other hand, VE-based removal has a more uniform impact, resulting in more nodes being removed or disconnected in South America and South-East Asia. The quantitative impact of the different strategies is shown in Table 1 . Comparison of the unweighted degree and VE-based removal strategies show that ∼36% more nodes need to be removed in the degree-based case to achieve a 20% reduction in the size of the GC. This is consistent with the objective of VE as a measure to identify single points of failure more accurately than degree. In both cases the number of edges falls, very roughly, by two-thirds, however, even more striking is the corresponding ∼90% collapse in network capacity. For the VE regime the number of edges and the network capacity are both approximately 25% higher than for the degree-based case. The results for the weighted variants of degree and VE are closer to each other than for the unweighted forms. The VE variant requires ∼7% fewer nodes to be removed, however, the number of edges in the resulting graphs differ by less than ∼1%. This would suggest a high degree of similarity between the sets of removed nodes in the weighted cases, we revisit this observation below. In regard to capacity, the overall reduction is slightly higher at ∼94%. Given that the weighted removal strategies use metrics based upon passenger counts, this higher reduction in capacity may be expected. What is perhaps surprising is that the unweighted measures, that rely only upon network structure, produce such similar reductions. When we compare the results for the weighted and unweighted strategies with each other, different observations emerge. More routes remain in the weighted cases but with a lower overall capacity. This suggests that the weighted approaches tend to retain more airports, but airports that serve shorter, less busy routes. The impact of different strategies on network capacity as nodes are removed can be seen in Fig. 2 . For the case of random removal, capacity reduces approximately linearly as the network reduces in size down to sizes of ∼85%. As the size of the network reduces further the rate of reduction in capacity flattens. The degree of scatter in the data also increases at this point, owing to the smaller sample sizes inherent to the random removal process and as we approach the 80% threshold. All of the targeted removal strategies exhibit similar characteristics. The very steep reduction in capacity for only small reductions in network size is particularly apparent. In fact a reduction in the size of the network of only a few percent can reduce its capacity by half. Below GC sizes of ∼95%, the rate at which capacity drops is reduced but in all cases the capacity at the 80% cutoff lie in the range 0.6-1.0 × 10 9 pkm. A striking feature of Fig. 2 is the small plateau in capacity for VE near 97.5%. This could be because vertex entropy is disconnecting leaf nodes in the route graph by removing the local hubs that they connect into, which will have high vertex entropy but potentially carry few passengers. We investigate the similarity of the removed node sets by examining how the size of the GC changes as we remove nodes in Fig. 3a and compare the specific node sets identified by the unweighted and weighted removal regimes in Fig. 3b using the well-known Jaccard similarity coefficient (JC). Fig. 3a shows that the GC collapses at about the same rate for both weighted strategies. More interesting is the variation in JC. There is a clear variation in the node sets for about the first 50 nodes to be removed but apart from the small peak at about 25 nodes the JC exhibits a consistent upward trend to a limit of about 0.8. We can conclude that a relatively large proportion of the nodes identified by the weighted regimes are in fact the same. The unweighted cases behave somewhat differently. Fig. 3a shows that the number of nodes required for the VE-based regime is consistently and considerably less than for the degree-based regime and as such the GC collapses at a much faster rate. The behavior of JC, Fig. 3b , is quite erratic and far less consistent than for the weighted regimes. There are multiple peaks and troughs across the full range of removed nodes suggesting that the unweighted techniques identify a more disparate set of nodes. Interestingly, the trough in JC at about 25 nodes corresponds to the plateau in network capacity observed in Fig. 2 at a GC size of 97.5%. Fig. 4 shows how the transmission rate on the network varies with its passenger carrying capability. In all removal scenarios, a reduction in the capacity of the network reduces the transmission rate. Unsurprisingly, random node removal has the least impact. Indeed, even when reducing network capacity by two-thirds the reduction in effective reproduction rate Re is only 7.5% to 1.85. At the same network capacity, all of the targeted node removal regimes show a far more substantial drop in transmission rate to about 1.72, an overall reduction of about 14% and almost double the reduction achieved by random removal. Figure 4 : The impact of different node removal strategies on the relationship between network capacity and effective transmission rate. An alternative analysis is to examine required capacity to achieve a target transmission rate. Inspecting Fig. 4 , for random node removal, a reduction of 0.15 in Re can be achieved by reducing network capacity to 3.2 × 10 9 pkm. For the same reduction in Re, targeted node removal requires a capacity of about 5.0 × 10 9 pkm, a significant improvement. We conclude that we can achieve the same Re but retain about 50% more capacity using targeted node removal. Cross-referencing these capacities with Fig. 2 shows that about 650 nodes need to be removed or disconnected using random-removal. Under targeted removal, and depending upon the specific regime, only about 70-140 nodes need to be removed. Further examination of Fig. 4 shows that for capacities above about 3.0 × 10 9 the targeted removal regimes behave similarly. Below 3.0 × 10 9 , their behaviors diverge a little, the unweighted forms giving larger reductions in transmission rate. This behavior is consistent with our earlier observation regarding weighted regimes retaining more flights and hence providing more opportunity for transmission to occur. We also observe that at the lowest capacities, both VE-based regimes produce larger reductions in transmission than their degree-based counterparts. In our analysis we have demonstrated that it is possible to choose a scheme for route restriction in transport networks that optimizes capacity whilst restricting epidemic spread. In particular, by using either a weighted vertex entropy or degree scoring of airports we are able to reduce the effective reproduction rate of an epidemic spread whilst preserving capacity much more effectively than with a random airport and route closure approach. Owing to simplifications in the model, we should not place too much reliance upon the absolute values of Re. In relative terms, and for a reduction in Re of 0.15, targeted node removal yielded networks with 50% more capacity than those restricted by random selection. This result would be perhaps less surprising if the airline network were robustly scalefree, but as stated before this is not conclusively true. For scale free networks it is well understood that the 'friendship paradox' [31] exploits high degree hubs to propagate spreading phenomena. Further, although vertex entropy is defined as a function of degree, the entropy based route restriction has a very different effect upon the capacity and epidemic in the low restriction regime. Our result perhaps exposes a subtle interplay between network function and structure. The purpose of our work is to motivate a discussion on how we can perhaps moderate our approach to extreme shutdown measures as a way to manage the public health challenges of COVID-19. We believe there are many ways to improve the robustness of our model, and in further work we intend to refine our results by taking a route by route approach to restrictions and also implementing a finer grained spreading model. In particular, we have not constrained the route restriction by other criteria such as retaining, or conversely severing, communications between particular geographic regions. These additional constraints would likely change radically the conclusions of our experiments. Despite the acknowledged shortcomings, we feel that our result justifies further exploration. The outbreak of covid-19: An overview Epidemic dynamics and endemic states in complex networks Spread of epidemic disease on networks Mathematics of epidemics on networks Cascading Economic Impacts of the COVID-19 Outbreak in China GDP monthly estimate Official Aviation Guide (OAG) Yearly Historic Flight Schedules Investigating spatiotemporal dynamics and synchrony of influenza epidemics in australia: An agent-based modelling approach Error and attack tolerance of complex networks Information processing in complex networks: Graph entropy and information functionals A history of graph entropy measures Vertex entropy as a critical node measure in network monitoring Relating vertex and global graph entropy in randomly generated graphs Beyond covid-19: Network science and sustainable exit strategies Collective dynamics of 'small-world'networks Network Science Statistical mechanics of complex networks Scale-free networks are rare Estimating potential infection transmission routes in hospital wards using wearable proximity sensors Network characteristics of air traffic in the continental united states Epidemics and percolation in small-world networks Critical phenomena in complex networks Fredman-komlós bounds and information theory Graph entropy: a survey The von neumann entropy of networks The entropy of randomized network ensembles Entropy of network ensembles A systematic review of covid-19 epidemiology based on current evidence Why your friends have more friends than you do