key: cord-0980647-pvaa4jv2 authors: Schimit, P.H.T.; Pereira, F.H. title: Disease spreading in complex networks: A numerical study with Principal Component Analysis date: 2018-05-01 journal: Expert Syst Appl DOI: 10.1016/j.eswa.2017.12.021 sha: 3a2039049a4c65892b962dbd5c90e170aea42ddb doc_id: 980647 cord_uid: pvaa4jv2 Disease spreading models need a population model to organize how individuals are distributed over space and how they are connected. Usually, disease agent (bacteria, virus) passes between individuals through these connections and an epidemic outbreak may occur. Here, complex networks models, like Erdös–Rényi, Small-World, Scale-Free and Barábasi–Albert will be used for modeling a population, since they are used for social networks; and the disease will be modeled by a SIR (Susceptible–Infected–Recovered) model. The objective of this work is, regardless of the network/population model, analyze which topological parameters are more relevant for a disease success or failure. Therefore, the SIR model is simulated in a wide range of each network model and a first analysis is done. By using data from all simulations, an investigation with Principal Component Analysis (PCA) is done in order to find the most relevant topological and disease parameters. Disease spreading has been modeled by using different mathematical tools, from ordinary differential equations (ODE) of Kermack and McKendrick SIR model (Susceptible-Infected-Recovered model) ( Anderson & May, 1991; Kermack & McKendrick, 1927 ) to multi-agent systems with large computational demand ( Balcan et al., 2010 ) . Analyze and understand how an epidemic outbreak occurs in a region and look for control strategies to combat are usually the objectives in these studies ( Anderson & May, 1991 ) . Individuals in different states of disease well mixed and homogeneously distributed over space used to be limitations of the ODE models, which is acceptable for a wide range of diseases ( Roy & Pascual, 2006 ) . However, when the spatial factor is important, other tools need to be used, like the concept of a graph, or network ( Albert & Barabasi, 2002 ) . In this case, the network (population) is formed by nodes (individuals) connected by edges (social and/or spatial contact) ( Boccaletti, Latora, Moreno, Chavez, & Hwang, 2006 ) . In the set of networks, the regular networks (all nodes have the same number of connections with other nodes, for instance) do plex connections structures ( Franc, 2004; Sander, Warren, Sokolov, Simon, & Koopman, 2002 ) , considering spatial pattern ( Dorjee, Revie, Poljak, McNab, & Sanchez, 2013; Rautureau, Dufour, & Durand, 2010; van Ravensway et al., 2012; Westgarth et al., 2009 ) , and also adopting small-world ( Moore & Newman, 20 0 0 ) and scale-free ( Colizza, Barthélemy, Barrat, & Vespignani, 2007 ) models (which will be explored in the next section). Given the flexible adaptability of this framework, a wide range of problems started to use some complex network models, for instance: analysis of zooplankton community ( Raymond & Hosie, 2009 ) , Buruli ulcer in Victoria, Australia ( van Ravensway et al., 2012 ) and swine shipments in Ontario, Canada ( Dorjee et al., 2013 ) ; exploration of network formed by dogs in a community ( Westgarth et al., 2009 ) and a study of the epidemic data of SARS (Severe Acute Respiratory Syndrome) in Beijing, China ( Zhong, Huang, & Song, 2009 ) . By using complex networks in these circumstances, it is possible to find relations between the population structure and disease characteristics. Such structure is measured by the topological parameters of the network (for instance clustering coefficient and shortest path, which will be also explored in the next section) ( Keeling, 2005 ) . However, depending on the problem, population may need a proper mathematical tool to consider space as an important factor, like cellular automata ( Holko, Mdrek, Pastuszak, & Phusavat, 2016 ) . More especifically, complex network approaches have proven to be a suitable tool for building expert systems, most notably in social sciences ( Legara, Monterola, & David, 2013; Wachs-Lopes & Rodrigues, 2016 ) . In general, complex network architecture is used to build and evaluate prediction models. The effect of network behavior and topology on model performance is also frequently evaluated ( Óskarsdóttir et al., 2017 ) . In the Linguistic area, for example, in which many studies have emerged due to explosive growth of Internet, complex network model for semantic representation of human language presents a behavior of scale-free network ( Wachs-Lopes & Rodrigues, 2016 ) . In this context, feature or attribute selection, which search for the best subset of attributes in a dataset, is a useful method for leading to a less redundant data, modeling accuracy improvement and reduced processing time for training expert systems ( Aladeemy, Tutun, & Khasawneh, 2017; Elangovan, Devasenapati, Sakthivel, & Ramachandran, 2011 ) . Control strategies which consider topological properties emerged as an alternative view for deciding how to combat an epidemic outbreak. In Ole ś, Gudowska-Nowak, and Kleczkowski (2012) , the size of neighborhood is considered for an optimal strategy in economic and epidemic terms; Ole ś, Gudowska-Nowak, and Kleczkowski (2014) show a study of costbenefit control methods related to topological parameters; and Xiao, Zhou, and Tang (2011) demonstrates the differences in control strategies for random and small-world networks. Control methods in random networks suggest that it better to focus control activities in highly connected individuals ( Jeger, Pautasso, Holdenrieder, & Shaw, 2007 ) . However, in some types of networks, topological parameters seem not to be an efficient way to understand an epidemic outbreak due to the wide range of networks which can be created for a determined set of topological parameters values ( Moslonka-Lefebvre et al., 2009; Schimit & Monteiro, 2009 ) . Accordingly, in this paper we use a fixed SIR model in populations modeled by random, small-world, scale-free and Barábasi-Albert networks to verify relations between disease characteristics and topological parameters in order to investigate if a determined parameter and/or a set of parameters can be used to predict disease spreading of all networks and/or a set of networks. Finally, the Principal Component Analysis (PCA) is a simple multivariate analysis based on eigenvalue decomposition of a data covariance matrix and the objective is to configure a lower-dimensional picture of the data to reveal the internal structure that best explains the variance. Consequently, PCA is often used when the system has many input variables and it is necessary to find the most influent for the output ( Jolliffe, 2002 ) . Therefore, we use different complex networks models for modeling a population and a simple SIR model to model the disease. The objective of this work is, regardless of the network/population model, analyze which topological parameters are more relevant for a disease success or failure by using PCA. From an epidemiological point of view, such methodology complement works which deal with partial information to either extract disease outbreaks characteristics ( Colizza & Vespignani, 2008; Moreno, Pastor-Satorras, & Vespignani, 2002 ) or decide control actions ( Ole ś et al., 2012; 2014; Xiao et al., 2011 ) . By using a wider range of population structures, it is possible to measure disease strength regardless of structure model. For an expert and intelligent system point of view, the methodology proposed for dynamical populations may be implemented for other problems ( Bajer, Martinovi, & Brest, 2016; Chang, Chen, & Lin, 2005; Li, Zhang, & Zeng, 2009; Simidjievski, Todorovski, & Deroski, 2015 ) . Complex networks have been frequently used to model populations in disease spreading models ( Albert & Barabasi, 2002; Boccaletti et al., 2006; May, 2006; Zhou, Fu, & Wang, 2006; Trapman, 2007; Zhong et al., 2009 ) . Although the proposed methodology is an innovative approach to handle with any type of network, it does not consider some specific attributes and results. For instance: • it only consider SIR model (not SEIR -SIR with Exposed state, for instance Keeling, Rand, & Morris, 1997; Verdasca et al., 2005 ) ; • there is no variation of disease parameters ( Moore & Newman, 20 0 0; Verdasca et al., 2005 ) , though here different parameters lead to dynamical equivalent results; • approximates the calculation of the basic reproduction number by ordinary differential equations, which is usually used for homogeneously mixing of population. Although the results were good even for heterogeneous networks, some works use other parameters to analyze disease strength ( Pellis, Ferguson, & Fraser, 2009 ); • some diseases have a strong influence of space, and it may be necessary complementary model to handle space ( Bigras-Poulin, Thompson, Chriel, Mortensen, & Greiner, 2006; Riley, 2007; Tildesley et al., 2010; Vazquez-Prokopec, Kitron, Montgomery, Horne, & Ritchie, 2010 ) . Such spatial dependence is not considered in this paper and; • it cannot be used for global approaches ( Balcan et al., 2010; Wang, Li, Zhang, Zhang, & Zhang, 2011 ) . This paper is organized as follows: in the next section, some basic concepts of graphs/networks are presented and in Section 3 first results of the model are explored. In Section 4 , a more robust analysis is made by using PCA and, in Section 5 , we present a final discussion. Topological parameters help to identify some properties of a network. Consider a network G with n nodes. The maximum number of edges happens when the network is fully connected and is equal to n (n − 1) / 2 . The distance between nodes i and j is the number of edges l ij which make up the shortest path between the nodes. Here, we use the following topological parameters as variable analysis: average shortest path, density, diameter, clustering coefficient, average degree and maximum degree ( Albert & Barabasi, 2002; Boccaletti et al., 2006; Newman, 2010 ) . The average shortest path of the network ( spl ) is the average value of l ij for every pair i and j , that is, l = Consider e the number of edges in the network. Density is the fraction of edges and all possible edges for a network, that is, den = e/n (n − 1) . If we consider the maximum value of l ij , we define the diameter diam = max (l i j ) , with 1 ≤ i, j ≤ n and i = j , which represents the longest shortest path of the network ( Boccaletti et al., 2006 ) . Finally, in 1998, Watts and Strogatz (1998) introduced the clustering coefficient, which is the fraction of connections b i which exist between i neighbors and the maximum value of connections. Consider k i the degree of a node, that is, the number of neighbors of the node i . Thus, the clustering coefficient for i is , and the average clustering coefficient is given Here, we also use the average degree ( k = n i =0 k i /n ) and the maximum degree k max = max (k i ) , 1 ≤ i ≤ n to analyze a network. One of the first complex network model was formulated by Erdos and Rényi (1959) . Based on completely random graphs, n nodes are connected by e edges randomly chosen among the n (n − 1) / 2 possible edges, that is, a fraction q = e/ (n (n − 1) / 2) of the edges form the connections of the network. Watts and Strogatz (1998) also created an algorithm to generate a network with similar average shortest path of Erdös-Rényi network (which is usually small) but also increasing the average clustering coefficient closer to social networks. Consider a regular topology, that is, each node is connected to m closer individuals. Then rewire a fraction q of the connection, and the network model is done. Note that such model is mainly locally connected with long distance random connections. When p = 1 , the final network is totally random, as the Erdös-Rényi model. Another typical property of real networks is the rule richer get richer when creating the network, that is, new nodes are more likely to connect to nodes with high degree. For these real networks, the degree distribution follows the expression P (k ) ∼ k −γ , with γ 2.2 ( Albert & Barabasi, 2002; Newman, 2010 ) . A distribution of nodes P (k ) = Ak −γ , with A and k constants, is named scale-free . Here, scale-free networks will be created determining the fraction p of edges to be added (from all possible) and the power law exponent of the degree distribution ( Bollobás, Riordan, Spencer, & Tusndy, 2001 ) . Barabási and Albert proposed a rule derived from scale-free models, the preferential attachment ( Barabási & Albert, 1999 ) . In this rule, the probability q that a new node will connect to a Here, Barabási-Albert networks will be created by determining the number of edges that each node will connect and the power of the preferential attachment, that is, the probability that an edge is cited is proportional to k power i . SIR model used in simulations is the same as used in Schimit and Monteiro (2009) . However, here each node represents an individual which may be in one of the disease states Susceptible, Infected and Recovered. The possible state transitions are listed below: • Susceptible individual may be infected with probability where v is the number of infected neighbors (that is, Infected nodes from a distance 1), and k is a parameter related to disease; • Infected individual may be cured with probability P c ; • Infected individual may die due to disease consequences with probability P d ; • Recovered individual may die due to natural causes with probability P n ; • Susceptible, Infected and Recovered individuals may continue in the same state after a time step; In Roy and Pascual (2006) , based on previous model from Keeling et al. (1997) , a comparison between ODE approaches pairwise formulation, heterogeneous mixing model and mean-field approximation is presented. Although the first two approaches exhibit important dynamical properties, the system equilibrium can be analyzed by using the mean-field approximation. Therefore, here we consider individuals from different states homogeneously distributed over the network to represent the population, since the objective to use ODE is to calculate the parameter R 0 , the basic reproduction number, which will be defined next. The state transitions listed above can be interpreted as rates in the ODE and the equations are: where a is the infection rate constant; b is the recovering rate constant; c is the death rate constant related to the disease; e is the death rate constant related to natural causes. The sets of stationary solutions ( S * / N, I * / N, R * / N ) (where S * , is the basic reproduction number and a stability analysis ( Monteiro, Sasso, & Berlinck, 2007 ) of Eq. (1) reveals that the disease-free stationary state is asymptotically stable if R 0 < 1 and unstable if R 0 > 1; and the endemic stationary state is unstable if R 0 < 1 and asymptotically stable if R 0 > 1. Moreno et al. (2002) studied a similar model and showed that for networks with finite average degree and quadratic average degree, there is a critical value (function of epidemiological and networks parameters) that indicates whether there will be or not disease spreading in the population. Furthermore, a, b, c and e can be estimated from simulations, since the ODE model is a mean-field approximation. From Schimit and Monteiro (2009) , the expressions that link these models are: Note that the rates of ODE are related to the probabilities of cellular automata. Principal Component Analysis (PCA) is one of the most popular methods for dimensionality reduction of a feature set. Therefore, PCA projects a dataset X into an orthonormal base in R N , which is defined as a set of p eigenvectors e i ∈ R N , i = 1 , . . . , p, of the covariance matrix of X . This orthonormal base is oriented in the directions that provide the maximum variance of X ∈ R N , in order to carry the most relevant information. Dimensionality reduction principle is the representation of the dataset X in terms of covariance matrix eigenvectors, which are called principal components ( Jolliffe, 2002 ) . In order to accomplish the dimensionality reduction, the dataset is represented as a real matrix U n × N , where n and N are, the number of rows and columns, respectively. Each row of U corresponds to an N -dimensional point and the columns represent values of N original variables. The covariance matrix of U is calculated, as well its eigenvalues and corresponding eigenvectors. These eigenvectors form a set of linearly independent vectors, i.e., a base { φ i } , i = 1 , . . . , n, which consist of a new axis system ( Guo, Wu, Massart, Boucon, & Jong, 2002 ) . Finally, to perform the dimensionality reduction, the rows of U are projected onto the base formed by the p eigenvectors related to the largest eigenvalues ( p n ). The coordinates of U projected in this reduced pdimension subspace are denoted as U φ 1 , U φ 2 , . . . , U φ n . As a result of the process presented before, the PCA returns a projection in the new space that is different from the original data. Usually, it is necessary to select the most relevant attributes without changing their values, that is, accomplish dimensionality reduction of a feature set by choosing a subset of the original features that contains most of the essential information ( Guo et al., 2002; Guyon, 2003 ) . The proposed approach for this problem, called principal feature analysis (PFA), is based on a method presented by Lu, Cohen, Zhou, and Tian (2007) . The algorithm can be summarized in the following steps: 1. Compute the covariance matrix of a zero mean n dimensional feature vector X and its eigenvalues and eigenvectors φ; 2. Choose the subspace dimension p and construct the matrix A p with the first p principal eigenvectors; 3. Calculate the projections of each point on the PCA subspace. As a result, we have a new set of p projected variables U φ 1 , U φ 2 , . . . , U φ p ; 4. Define a contribution index of each original variable (columns of U ) on the projection as a weighted sum of the inner product between the variable and each principal component. This contribution index is directly related to the angle cosine between the original variable and each principal component in Euclidean space. The weights are taken as the amount of data variation explained by each principal component. Thus, the principal feature is chosen according to largest contribution index variable. Opposed to the original PCA method which projects the original data onto a subspace of eigenvectors, the PFA approach selects the most relevant attributes without change their values. Such selection considers a subset of the original features based on the distance between these features and the principal components that contains most part of the essential information, as defined in the step 4. In order to compare disease spreading on networks, epidemiological parameters of the model presented previously are fixed: k = 0 . 1 , P c = 60% , P d = 30% and P n = 10% ( Schimit & Monteiro, 2009 the system already reached the permanent regime. In the beginning, the population network is created and remains fixed throughout simulation, that is, individuals have always the same neighborhood. Fig. 1 exhibits the temporal evolutions for networks (a) Erdös-Rényi, (b) small-world, (c) Barbasi-Albert and (d) scale-free. Every R 0 are indicated in the figure, as well the average clustering coefficient and average shortest path of each network. Light gray lines exhibit corresponding disease states for ODE simulations whose parameters where calculated from network simulations using Eq. (2) . Note that the networks have similar topological parameters, however, R 0 and the disease dynamic is different of each other. Furthermore, the temporal evolution of ODE and network models are different, though percentage of individuals in the steady state are similar. A good overview about the visual differences of how each network is created can be found at Shirley and Rushton (2005) . Therefore, here we simulate the disease spreading in a wide range of topological parameters for each complex network model. The tool for generating these networks is the C/C ++ library iGraph ( Csardi & Nepusz, 2006 ) . The next sections formalize how the networks are stressed. Considering a Erdös-Rényi network, a fraction p of all the possible edges is added to the network, that is, each possible edge has a probability of being added equal to p . The iGraph environment requires the value of p , thus, epidemiological model is simulated for each network with p in the range .0 0 01:.0 0 01:.5. In these simulations, average clustering coefficient results in values 0 ࣠ cc ࣠ 0.5, average shortest path, 1.5 ࣠ spl ࣠ 13, diameter, 2 ࣠ diam ࣠ 8, density, On small-world networks, each node starts with m connections with closer individuals. Then each connection is rewired with probability p , that is, any of the possible edges in the graph may be added by removing such connections. The iGraph environment requires the value of m and p , thus, epidemiological model is simulated for each network with p in the range .01:.01:1, and m in the range 1: 1: 150. In these simulations, average clustering coefficient results in values 0 ࣠ cc ࣠ 0.75, average shortest path, 1.78 ࣠ spl ࣠ 125, diameter, 2 ࣠ diam ࣠ 6, density, 0 ࣠ den ࣠ 0.2. Fig. 3 exhibits how these properties influences the value of R 0 . Note that small-world networks are less dense than Erdös-Rényi networks with the same potential for a disease spreading depending on other topological features. Also, here, clustering coefficient is not enough to determine the value of R 0 , needing an- Fig. 3 . Small-world simulations with R 0 in function of topological parameters clustering coefficient, shortest path, density, diameter, average degree and maximum degree. other parameter to verify disease spreading properties. The separated dots in shortest path and diameter figures are related to p = 0 , when the network is regular with each node having the same number of connections m . For scale-free networks, the number of edges e in the graph and the power law exponent γ determines the generation. That is, e edges are added to the network, and the probability that a node is chosen to get an edge is given by P (k ) = k γ , where k is the node degree. The iGraph environment requires the value of e and γ , thus, epidemiological model is simulated for each network with a fraction of possible edges q in the range 0.05:0.05:0.6, and γ in the range 2:0.1:6. In these simulations, average clustering coefficient results in values 0 ࣠ cc ࣠ 0.6, average shortest path, 1.4 ࣠ spl ࣠ 4.48, diameter, 2 ࣠ diam ࣠ 6, density, 0 ࣠ den ࣠ 0.6. Fig. 4 exhibits how these properties influences the value of R 0 . Scale-free network model allows a good range of topological parameters for the epidemiological model. Note that the model needs more edges in order to exhibit similar values of R 0 than a smallworld network, which is not so dense. Fig. 4 . Scale-free simulations with R 0 in function of topological parameters clustering coefficient, shortest path, density, diameter, average degree and maximum degree. Barábasi-Albert network is a subset of scale-free networks. The difference is how the network is created, because Barábasi-Albert requires the exponent γ for the probability of a node being chosen to get an edge P (k ) = k γ , and the number of outgoing edges generated for each node m . The iGraph environment requires the value of m and γ , thus, epidemiological model is simulated for each network with m in the range 5: 5: 200, and γ in the range 2: 0.1: 5. In these simulations, average clustering coefficient results in values 0.01 ࣠ cc ࣠ 0.48, average shortest path, 1.67 ࣠ spl ࣠ 2.42, diameter, 2 ࣠ diam ࣠ 4, density, 0 ࣠ den ࣠ 0.36. Fig. 5 exhibits how these properties influences the value of R 0 . Such construction model generates networks with nodes with high degrees, and the consequence is the small range of the average shortest path. However, even for such small range, see that R 0 abruptly fall from R 0 ∼ 12 when average shortest path is spl ∼ 1.6, to R 0 ∼ 2 when average shortest path is spl ∼ 2.4. In order to show the need of a more robust statistical analysis for all network data, all simulation results are show in Fig. 6 . Note that the average clustering coefficient, average shortest path, diameter and maximum degree is not enough to clearly identify a R 0 prediction. Although there is a variance in data, density and average degree have trends which allow a R 0 prediction. Moreover, R 0 > 1, i.e., disease persists in population when den ࣡ 0.01, and when average degree avdeg ࣡ 10. Therefore, PCA has been used to get other relationships between disease and network parameters. The variables used were: average clustering coefficient ( cc ); average shortest path ( spl ); density ( den ); diameter ( diam ); average degree ( avdeg ); maximum degree ( maxdeg ); amount of individuals Susceptible ( S ) Infected ( I ) and Recovered ( R ) when the system reached the permanent regime; Infected peak ( Ip ), (i.e., the amount of Infected individuals in the initial outbreak of disease) and; instant of Infected peak ( iIp ), which is the time step when the peak occurred. All these 12 variables have been considered for all 41,270 experiments of all networks and the Fig. 7 contains the normalized projection of each variable. Note that according to PCA, the internal structure of the data that best explains the variance in the data have maxdeg, Ip, R and avdeg as most informative variables. Fig 6 already exhibited R 0 in function of maxdeg , and such variable certainly does not explain Fig. 6 . Data for all networks put together for R 0 in function of topological parameters clustering coefficient, shortest path, density, diameter, average degree and maximum degree. the disease variables. Actually, the maximum degree of the network is very sensitive to the other topological parameters for all networks. Thereby, relationships on Figs. 8 -11 are based on PCA results. The Fig. 8 shows that small values of average degree is enough for a high peak of infected individuals and the trend of increasing the I ( t ) peak changes at around 300, when it starts to decrease. Fig. 9 indicates that the sooner the I ( t ) occurs, the high the value of the peak is. Fig. 10 contains the same data of Fig. 6 for average degree, but in a different scale. Somehow, PCA confirms the importance of the average degree for analyzing a disease spreading. Note that for all figures, the value of R 0 saturates. In such condition, the term aS ( t ) I ( t ) of Eq. (1) can be written as aS(t ) I(t ) = S(t ) , since all Susceptible individuals become infected. Accordingly, the new equations are: with the set of stationary solutions (as done for Eq. (1) ): + 1) )) . Therefore, we have a sat = 1 /I * sat , thus: Using Eq. (2) for determining values for b, c and e , we have R 0 sat = 11 . Thus, the white thick dashed line in Fig. 10 is a fitted curve for the experimental points in the form: Finally, a distinct result is presented on Fig. 11 , where the R 0 is plotted in function of the amount of Recovered individuals ( R ) when the system reached the permanent regime. Here, the disease R 0 increases when R increases and this result is corroborated by other related papers, since the disease qualitatively parameters used are usually from diseases like mumps, chickenpox and measles ( Monteiro, Chimara, & Berlinck, 2006 ) which also have high R 0 and high amount of Recovered individuals in population ( Anderson & May, 1991 ) . If we consider that the permanent regime of the system has R 0 > 1, i.e., disease is active, R 0 can be approxi- In this paper, we presented a method to understand a disease propagation according to the most important topological parameters of four types of complex networks. Disease were modeled by SIR-model, population by networks Erdös-Rényi, Small-World, Scale-Free and Barábasi-Albert and the statistical process to analyze the data were Principal Component Analysis. Based on the results, following characteristics of epidemic outbreaks in populations emerged as most important factors: average degree, infected individuals peak, instant that such peak occurs, amount of recovered individuals in system steady-state and, of course, the basic reproduction number, R 0 . Topological parameters like clustering coefficient and shortest path length, which are often used to analyze disease spreading on networks ( Dorjee et al., 2013; Keeling, 2005; Lennartsson, Håkansson, Wennergren, & Jonsson, 2012; Moslonka-Lefebvre et al., 2009; Ole ś et al., 2014; Raymond & Hosie, 2009; Schimit & Monteiro, 2009 ), should not be used when many network models are considered or the model is unknown, though they are robust when the model is well defined. Therefore, considering that social networks may not be properly represented by a determined model, as well as assumptions for modeling may not be correct, a careful parameter choice for analyzing disease propagation must be done, as concluded in Shirley and Rushton (2005) . Here, we presented some parameter to consider, like the average degree, density and the amount of Recovered individuals. Moreover, results came from a wide range of networks: from highly concentrated connections, like Barábasi-Albert networks, to Erdös-Rényi model, where connections are equally distributed over the population. Nevertheless, average degree were an important topological parameter, also noted in Colizza et al. (2007) . Lastly, the simulation diversity made it possible to verify a saturation in R 0 value, that is, a maximum value for R 0 given the epidemiological parameters, like the probability of recovering from disease, probability of dying due to disease and probability for dying from natural causes. Such saturation occurs when all Susceptible individuals get infected at each time-step. High value of R 0 , most part of population in Recovered state, almost all Susceptible individuals getting infected are characteristics of a well known scenario for child diseases like mumps, chickenpox and measles if a age stratified population is considered ( Wallinga, Teunis, & Kretzschmar, 2006 ) . Considering the possibilities of future work directions, they should handle with following questions: • Is the PCA approach used here suitable to other diseases models as well as populations modeled by another multi-agent environment, like cellular automata ( Holko et al., 2016 ) ? • Is the PCA approach suitable to other uses of populations, like evolutionary algorithms ( Bajer et al., 2016; Chang et al., 20 05; Li et al., 20 09 ) and general population dynamics ( Simidjievski et al., 2015 ) ? • Considering mathematical epidemiology, the inclusion of methods to control the spread of the disease to the model could return the most effective to combat the disease. Vaccination and limiting contacts between individuals should be tested; • The calculation of R 0 is usually difficult in the first cases of a disease outbreak ( Mossong & Muller, 20 0 0 ). The PCA model could be used in the initial transient of disease with partial information to return the most important variables to consider to approximate the R 0 value. A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence Statistical mechanics of complex networks. Reviews of Modern Physics Infectious diseases of humans: Dynamics and control. Oxford science publications A population initialization method for evolutionary algorithms based on clustering and cauchy deviates Modeling the spatial spread of infectious diseases: The global epidemic and mobility computational model The impact of past epidemics on future disease dynamics Emergence of scaling in random networks Network analysis of danish cattle industry trade patterns as an evaluation of risk potential for disease spread Complex networks: Structure and dynamics The degree sequence of a scale-free random graph process Two-phase sub population genetic algorithm for parallel machine-scheduling problem Epidemic modeling in complex realities Epidemic modeling in metapopulation systems with heterogeneous coupling pattern: Theory and simulations The iGraph software package for complex network research. InterJournal, Complex Systems Network analysis of swine shipments in Ontario, Canada, to support disease spread modelling and risk-based disease management Evaluation of expert system for condition monitoring of a single point cutting tool using principle component analysis and decision tree algorithm On random graphs Metapopulation dynamics as a contact process on a graph Feature selection in principal component analysis of analytical data An introduction to variable and feature selection Epidemiological modeling with a population density map-based cellular automata simulation system Modelling disease spread and control in networks: implications for plant sciences Principal component analysis The implications of network structure for epidemic dynamics Correlation models for childhood epidemics A contribution to the mathematical theory of epidemics Complex network tools in building expert systems that perform framing analysis SpecNet: A spatial network algorithm that generates a wide range of specific structures Research of multi-population agent genetic algorithm for feature selection Feature selection using principal feature analysis Network structure and the biology of populations Big cities: Shelters for contagious diseases Continuous and discrete approaches to the epidemiology of viral spreading in populations taking into account the delay of incubation time Epidemics and percolation in small-world networks Epidemic outbreaks in complex heterogeneous networks SIS along a continuum (SIS(c)) epidemiological modelling and control of diseases on directed trade networks Disease spread in small-size directed networks: Epidemic threshold, correlation between links to and from nodes, and clustering Estimation of the basic reproduction number of measles during an outbreak in a partially vaccinated population Networks: An introduction Understanding disease control: Influence of epidemiological and economic factors Cost-benefit analysis of epidemics spreading on clustered random networks Social network analytics for churn prediction in Telco: Model building, evaluation and network architecture Threshold parameters for a model of epidemic spread among households and workplaces Vulnerability of animal trade networks to the spread of infectious diseases: A methodological approach applied to evaluation and emergency control strategies in cattle Climate and landscape factors associated with Buruli ulcer incidence in Network-based exploration and visualisation of ecological data Models of infectious disease On representing network heterogeneities in the incidence rate of simple epidemic models Percolation on disordered networks as a model for epidemics On the basic reproduction number and the topological properties of the contact network: An epidemiological study in mainly locally connected cellular automata The impacts of network topology on disease spread Predicting long-term population dynamics with bagging and boosting of process-based models Epidemic dynamics on complex networks Impact of spatial clustering on disease transmission and optimal control On analytical approaches to epidemics on networks Quantifying the spatial dimension of dengue virus epidemic spread within a tropical urban environment Recurrent epidemics in small world networks Analyzing natural human language from the point of view of dynamic of a complex network Original contribution using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents Evolution of scaling emergence in large-scale spatial epidemic spreading Collective dynamics of small-world networks Walking the dog: Exploration of the contact networks between dogs in a community Modelling disease spread in dispersal networks at two levels Simulation of the spread of infectious diseases in a geographical environment PHTS is partially supported by grants #303743/2016-6 and #402874/2016-1 of Conselho Nacional de Desenvolvimento Científico e Tecnológico ( CNPq ) and grant #2017/12671-8 , São Paulo Research Foundation (FAPESP).