key: cord-0007573-d57zqixs authors: da Fontoura Costa, Luciano; Sporns, Olaf; Antiqueira, Lucas; das Graças Volpe Nunes, Maria; Oliveira, Osvaldo N. title: Correlations between structure and random walk dynamics in directed complex networks date: 2007-07-30 journal: Appl Phys Lett DOI: 10.1063/1.2766683 sha: e483fdbda7f46cb9d94fdc826a6f78bea79ddf7c doc_id: 7573 cord_uid: d57zqixs In this letter the authors discuss the relationship between structure and random walk dynamics in directed complex networks, with an emphasis on identifying whether a topological hub is also a dynamical hub. They establish the necessary conditions for networks to be topologically and dynamically fully correlated (e.g., word adjacency and airport networks), and show that in this case Zipf’s law is a consequence of the match between structure and dynamics. They also show that real-world neuronal networks and the world wide web are not fully correlated, implying that their more intensely connected nodes are not necessarily highly active. In this letter the authors discuss the relationship between structure and random walk dynamics in directed complex networks, with an emphasis on identifying whether a topological hub is also a dynamical hub. They establish the necessary conditions for networks to be topologically and dynamically fully correlated ͑e.g., word adjacency and airport networks͒, and show that in this case Zipf's law is a consequence of the match between structure and dynamics. They also show that real-world neuronal networks and the world wide web are not fully correlated, implying that their more intensely connected nodes are not necessarily highly active. © 2007 American Institute of Physics. ͓DOI: 10.1063/1. 2766683͔ We address the relationship between structure and dynamics in complex networks by taking the steady-state distribution of the frequency of visits to nodes-a dynamical feature-obtained by performing random walks 1 along the networks. A complex network 2-5 is taken as a graph with directed edges and associated weights, which are represented in terms of the weight matrix W. The N nodes in the network are numbered as i =1,2, ... ,N, and a directed edge with weight M, extending from node j to node i, is represented as W͑i , j͒ = M. No self-connections ͑loops͒ are considered. The in and out strengths of a node i, abbreviated as is͑i͒ and os͑i͒, correspond to the sum of the weights of its in-and outbound connections, respectively. The stochastic matrix S for such a network is The matrix S is assumed to be irreducible; i.e., any of its nodes can be accessible from any other node, which allows the definition of a unique and stable steady state. An agent, placed at any initial node j, chooses among the adjacent outbound edges of node j with probability equal to S͑i , j͒. This step is repeated a large number of times T, and the frequency of visits to each node i is calculated as v͑i͒ = ͑number of visits during the walk͒ / T. In the steady state ͑i.e., after a long time period T͒, v = Sv and the frequency of visits to each node along the random walk may be calculated in terms of the eigenvector associated with the unit eigenvalue ͑e.g., Ref. 6͒. For proper statistical normalization we set ͚ p v͑p͒ = 1. The dominant eigenvector of the stochastic matrix has theoretically and experimentally been verified to be remarkably similar to the corresponding eigenvector of the weight matrix, implying that the adopted random walk model shares several features with other types of dynamics, including linear and nonlinear summations of activations and flow in networks. In addition to providing a modeling approach intrinsically compatible with dynamics involving successive visits to nodes by a single or multiple agents, such as is the case with world wide web ͑WWW͒ navigation, text writing, and transportation systems, random walks are directly related to diffusion. More specifically, as time progresses, the frequency of visits to each network node approaches the activity values which would be obtained by the traditional diffusion equation. A full congruence between such frequencies and activity diffusion is obtained at the equilibrium state of the random walk process. Therefore, random walks are also directly related to the important phenomenon of diffusion, which plays an important role in a large number of linear and nonlinear dynamic systems including disease spreading and pattern formation. Random walks are also intrinsically connected to Markov chains, electrical circuits, and flows in networks, and even dynamical models such as Ising. For such reasons, random walks have become one of the most important and general models of dynamics in physics and other areas, constituting a primary choice for investigating dynamics in complex networks. The correlations between activity ͑the frequency of visits to nodes v͒ and topology ͑out strength os or in strength is͒ can be quantified in terms of the Pearson correlation coefficient r. For full activity-topology correlation in directed networks, i.e., ͉r͉ = 1 between v and os or between v and is, it is enough that ͑i͒ the network must be strongly connected, i.e., S is irreducible, and ͑ii͒ for any node, the in strength must be equal to the out strength. The proof of the statement above is as follows. Because the network is strongly connected, its stochastic matrix S has a unit eigenvector in the steady state, i.e., v = Sv. Since S͑i , j͒ = W͑i , j͒ /os͑j͒, the ith element of the vector Sos is given as By hypothesis, is͑i͒ =os͑i͒ for any i and, therefore, both os and is are eigenvectors of S associated with the unit eigenvalue. Then os= is= v, implying full correlation between frequency of visits and both in and out strengths. An implication of this derivation is that for perfectly correlated networks, the frequency of symbols produced by random walks will be equal to the out strength or in strength distributions. Therefore, an out strength scale-free 3 network must produce sequences obeying Zipf's law 7 and vice versa. If, on the other hand, the node distribution is Gaussian, the frequency of visits to nodes will also be a Gaussian function; that is to say, the distribution of nodes is replicated in the node activation. Although the correlation between node strength and random walk dynamics in undirected networks has been established before 8 ͑including full correlation 9,10 ͒, the findings reported here are more general since they are related to any directed weighted network, such as the WWW and the airport network. Indeed, the correlation conditions for undirected networks can be understood as a particular case of the conditions above. A fully correlated network will have ͉r͉ = 1. We obtained r = 1 for texts by Darwin 11 and Wodehouse 12 and for the network of airports in the USA. 13 The word association network was obtained by representing each distinct word as a node, while the edges were established by the sequence of immediately adjacent words in the text after the removal of stopwords 14 and lemmatization. 15 More specifically, the fact that word U has been followed by word V, M times during the text, is represented as W͑V , U͒ = M. Zipf's law is known to apply to this type of network. 16 The airport network presents a link between two airports if there exists at least one flight between them. The number of flights performed in one month was used as the strength of the edges. We obtained r for various real networks ͑Table I͒, including the fully correlated networks mentioned above. To interpret these data, we recall that a small r means that a hub ͑large in or out strength͒ in topology is not necessarily a center of activity. Notably, in all cases considered r is greater for the in strength than for the out strength. This may be understood with a trivial example of a node from which a high number of links emerge ͑implying large out strength͒ but which has only very few inbound links. This node, in a random walk model, will be rarely occupied and thus cannot be a center of activity, though it will strongly affect the rest of the network by sending activation to many other targets. Understanding why a hub in terms of in strength may fail to be very active is more subtle. Consider a central node receiving links from many other nodes arranged in a circle, i.e., the central node has a large in strength but with the surrounding nodes possessing small in strength. In other words, if a node i receives several links from nodes with low activity, this node i will likewise be fairly inactive. In order to further analyze the latter case, we may examine the correlations between the frequency of visits to each node i and the cumulative hierarchical in and out strengths of that node. The hierarchical degree 17-19 of a network node provides a natural extension of the traditional concept of node degree. The im-TABLE I. Number of nodes ͑No. nodes͒, number of edges ͑No. Edges͒, means and standard deviations of the clustering coefficient ͑CC͒, cumulative hierarchical in strengths for levels 1-4 ͑IS1-IS4͒, cumulative hierarchical out strengths for levels 1-4 ͑OS1-OS4͒, and the Pearson correlation coefficients between the activation and all cumulative hierarchical in strengths and out strengths ͑r IS1r OS4 ͒ for the complex networks considered in the present work. For the least correlated network analyzed, viz., that of the largest strongly connected cluster in the network of WWW links in the domain of Ref. 21 ͑Massey University, New Zealand͒ ͑Refs. 22 and 23͒ activity could not be related to in strength at any hierarchical level. Because the Pearson coefficient corresponds to a single real value, it cannot adequately express the coexistence of the many relationships between activity and degrees present in this specific network as well as possibly heterogeneous topologies. Very similar results were obtained for other WWW networks, which indicate that the reasons why topological hubs have not been highly active cannot be identified at the present moment ͑see, however, discussion for higher correlated networks below͒. However, for the two neuronal structures of Table I that are not fully correlated ͑network defined by the interconnectivity between cortical regions of the cat 24 and network of synaptic connections in C. elegans 25 ͒, activity was shown to increase with the cumulative first and second hierarchical in strengths. In the cat cortical network, each cortical region is represented as a node, and the interconnections are reflected by the network edges. Significantly, in a previous paper, 26 it was shown that when connections between cortex and thalamus were included, the correlation between activity and outdegree increased significantly. This could be interpreted as a result of increased efficiency with the topological hubs becoming highly active. Furthermore, for the fully correlated networks, such as word associations obtained for texts by Darwin and Wodehouse, activity increased basically with the square of the cumulative second hierarchical in strength ͑see Supplementary Fig. 2 . in Ref. 20͒ . In addition, the correlations obtained for these two authors are markedly distinct, as the work of Wodehouse is characterized by substantially steeper increase of frequency of visits for large in strength values ͑see Supplementary Fig. 3 in Ref. 20͒. Therefore, the results considering higher cumulative hierarchical degrees may serve as a feature for authorship identification. In conclusion, we have established ͑i͒ a set of conditions for full correlation between topological and dynamical features of directed complex networks and demonstrated that ͑ii͒ Zipf's law can be naturally derived for fully correlated networks. Result ͑i͒ is of fundamental importance for studies relating the dynamics and connectivity in networks, with critical practical implications. For instance, it not only demonstrates that hubs of connectivity may not correspond to hubs of activity but also provides a sufficient condition for achieving full correlation. Result ͑ii͒ is also of fundamental importance as it relates two of the most important concepts in complex systems, namely, Zipf's law and scale-free networks. Even though sharing the feature of power law, these two key concepts had been extensively studied on their own. The result reported in this work paves the way for important additional investigations, especially by showing that Zipf's law may be a consequence of dynamics taking place in scalefree systems. In the cases where the network is not fully correlated, the Pearson coefficient may be used as a characterizing parameter. For a network with very small correlation, such as the WWW links between the pages in a New Zealand domain analyzed here, the reasons for hubs failing to be active could not be identified, probably because of the substantially higher complexity and heterogeneity of this network, including varying levels of clustering coefficients, as compared to the neuronal networks. This work was financially supported by FAPESP and CNPq ͑Brazil͒. Luciano da F. Costa thanks grants 05/ 00587-5 ͑FAPESP͒ and 308231/03-1 ͑CNPq͒. 1 Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues ͑Springer The Formation of Vegetable Mould through the Action of Worms, with Observations on their Habits ͑Murray The Pothunters ͑A & C Black Bureau of Transportation Statistics: Airline On-Time Performance Data Modern Information Retrieval ͑Addison-Wesley The Oxford Handbook of Computational Linguistics ͑Oxford Human Behaviour and the Principle of Least Effort ͑Addison-Wesley