key: cord-0689302-3081c691 authors: Gao, Ting-Ting; Yan, Gang title: Autonomous inference of complex network dynamics from incomplete and noisy data date: 2022-02-09 journal: Nature Computational Science DOI: 10.1038/s43588-022-00217-0 sha: cf5da0e8bb13dc99a0d2d25a20fb3322529c3242 doc_id: 689302 cord_uid: 3081c691 The availability of empirical data that capture the structure and behavior of complex networked systems has been greatly increased in recent years, however a versatile computational toolbox for unveiling a complex system's nodal and interaction dynamics from data remains elusive. Here we develop a two-phase approach for autonomous inference of complex network dynamics, and its effectiveness is demonstrated by the tests of inferring neuronal, genetic, social, and coupled oscillators dynamics on various synthetic and real networks. Importantly, the approach is robust to incompleteness and noises, including low resolution, observational and dynamical noises, missing and spurious links, and dynamical heterogeneity. We apply the two-phase approach to inferring the early spreading dynamics of H1N1 flu upon the worldwide airline network, and the inferred dynamical equation can also capture the spread of SARS and COVID-19 diseases. These findings together offer an avenue to discover the hidden microscopic mechanisms of a broad array of real networked systems. From two-photon calcium imaging of neuronal activities 1,2 , high-throughput genetic experiments 3, 4 to digital recordings of human mobility [5] [6] [7] , our ability to observe the dynamic behavior of nodes in complex biological, social and technological systems has advanced spectacularly in the past years. The collected observations, often in the form of time-series data, allow us to extract the dynamic patterns of a system's individual nodes. To gain meaningful insights on the system, however, such a reductionist approach of tracking all individual nodes is insufficient. Indeed, complex system behavior emerges not just from the single nodes, but rather from the dynamic interactions between the nodes 6, [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] . This requires us to infer complex network dynamics, i.e. to retrieve both self nodal dynamics and interaction dynamics from the accumulating data of network topological structure and nodes' activities. The balance of self vs. interaction dynamics is most naturally captured by a general equation that tracks the activities of all nodes via 9 where x i (t) ≡ (x i,1 (t), . . . , x i,d (t)) is node i's d-dimensional activity, representing, e.g. the membrane potential of a neuron in a brain network 9, 12 , the proportion of infected people in a country or region [5] [6] [7] or the state of a component in an oscillator network 19 . These activities are driven by the self-regulation function F(x i ) ≡ (F 1 (x i ), . . . , F d (x i )) , designed to describe the dynamics of all nodes in isolation, and by the pairwise function G(x i (t), x j (t)) ≡ (G 1 (x i , x j ), . . . , G d (x i , x j )) which captures the dynamic mechanisms of interaction between the nodes. Finally, the network A ij , an n × n adjacency matrix, denotes the influence or flow from node j to i, where n is the number of nodes in the system. As shown by Barzel and Barabási, with appropriate choices of the nonlinear functions F and G, Eq. (1) is able to describe a broad range of complex systems 9 . However, for most real systems, the functions F and G are unknown. Hence, a pressing lacuna in the study of complex systems is a versatile computational toolbox for automatically inferring Eq. (1) from the observed data of network topology A ij and node activities x i (t). Complex biological, social or technological systems lack the fundamental physical rules that govern particle systems, so we do not have a priori knowledge of their internal microscopic mechanisms 20 . Therefore, the goal is not to only identify the model's parameters but rather to retrieve the forms of F and G and infer the explicit model itself. Despite the recent significant progress in developing methods to infer the governing equations of single-or few-body dynamics 21-27 , the task of inferring network dynamics poses particular challenges. For example, F and G are usually of different types hence one cannot obtain their compact forms if using only orthogonal basis functions 22,23,28,29 ; Node activities data are noisy and the mappings of network topologies are usually incomplete 30,31 ; Collective behavior, such as synchronization and consensus 19 , can conceal the specific forms of microscopic mechanisms in interaction dynamics. To overcome these challenges we propose here a two-phase inference approach. Our analysis indicates that the two-phase strategy allows us to achieve efficient and, most importantly, highly accurate inference, even in the face of unfavorable scenarios, such as noisy or low-resolution data or an only partially mapped topology (Fig. 1a) . Overview of the two-phase inference approach Lacking a priori knowledge of the structure of F and G, a natural approach is to pre-construct two extensive libraries L F and L G that contain a variety of elementary functions. The combinations of these elementary functions can potentially generate the true network dynamics. In this work, the libraries contain not only orthogonal basis functions but include polynomials, trigonometric, exponential, fractional, rescaling, sigmoid and other activation functions that frequently used in various domains (see Supplementary Tables 1 and 2 ). Large libraries are helpful for finding a compact and optimal model to capture network dynamics but also make the inference problem more difficult, because, due to the lack of orthogonality, the elementary functions can be similar with each other and thus less discriminative. Introducing the time series data x i (t), where i = 1, 2, . . . , n, into L F and L G , we obtain two time-varying matrices Θ F (t) ≡ L F (x i (t)) and Θ G (t) ≡ L G (x i (t), x j (t)) that encode the patterns of node activities imposed by the elementary functions in L F and L G (Fig. 1b) . Then the inference problem can be recast to selecting appropriate patterns in Θ F (t) and Θ G (t) that best match the evolution of observed system stateẋ(t), i.e. to inferring the sparse coefficients ξ F and ξ G that best solvė where A ≡ A ⊗ I d , Θ F ≡ Θ F ⊗ I d , Θ G ≡ Θ G ⊗ I d , symbol ⊗ denotes Kronecker product and I d is the ddimensional identity matrix. Here, we consider the general setting where each node state is d-dimensional and the network is directed and heterogeneous. Consequently, the problem of inferring complex network dynamics is high-dimensional and irreducible. Indeed, the number of elementary functions in L F and L G is approximately 25, 80 or 140 when node activity itself has one, two, or three dimensions respectively in the simulation validations below (see Supplementary Tables 1 and 2) . Our approach is a two-phase procedure consisting of global regression and local fine-tuning. In phase I, we approximate the derivativesẋ(t) (see Methods) and calculate the matrices Θ F (t) and Θ G (t), and then normalize each column of them (Fig. 1b) . These normalized data are used to identify, through regression, the leading elementary functions that are most probably constituents of the true F and G (Fig. 1c , and see also Methods). Phase I is able to significantly narrow down the model space but the dynamical equation inferred by such regression alone lacks generative power (Fig. 1d) . Next, in phase II, we perform fine-tuning with the original values ofẋ(t), Θ F (t) and Θ G (t), i.e. without normalization. We use topological samplings (see Methods) and the weighted Akaike's information criterion (wAIC, see Methods) to sequentially remove the elementary functions with smallest inferred coefficients (Fig. 1e) . The final sets of elementary functions and their coefficientsξ F andξ G composeF andĜ, leading to the inferred dynamics of complex networks (Fig. 1f ). To validate the effectiveness of our approach we apply it to infer five network dynamics, including Hindmarsh-Rose 32 (HR, d = 3) and FitzHugh-Nagumo 32 (FHN, d = 2) neuronal systems, social balance dynamics 33 (SB, d = 1), Kuramoto dynamics 34 (d = 1), and coupled heterogeneous Rössler oscillators 35 (d = 3), here d is the dimension of each node activity. To obtain the nodes' activities data we respectively simulate these dynamics (Supplementary Table 4 ) on a variety of toplogies, including Erdős-Rényi (ER) and scale-free (SF) synthetic networks, and five empirical networks -cellular-level brain networks of C. elegans and Drosophila, Advogato social network, power grids of North-Europe and United States. The time series of node activities and each network topology are the input data to our approach. The five specific equations governing these dynamics are the ground truths that we aim to infer. These dynamical models and networks are widely used in various domains and exhibit different properties (see Supplementary Information Sections II and III) , accounted for the diversity of our tests. Figure 2 illustrates the procedure of inferring FitzHugh-Nagumo neuronal network dynamics. Through the global regression, Phase I identifies ten most relevant elementary functions for each dimension of FHN (Fig. 2b ), and then, by local fine-tuning, Phase II autonomously learns the compact and optimal form of the dynamical equation as well as the most appropriate coefficient for each of the necessary elementary functions (Fig. 2c ). The form of the inferred equation in Fig. 2c perfectly matches the ground truth in Fig. 2a and the learnt coefficients are also highly accurate. Indeed, the relative errors ∆ = (ξ −ξ)/ξ, where ξ andξ are true and learnt coefficients respectively, are smaller than 3% (Fig. 2d) . The dynamical equation inferred by our approach exhibits generative power, being able to generate nodes' activities and trajectories that agree well with observation data (Fig. 2 (e,f)). Our approach also successfully infers the equations governing other four network dynamics. Regarding the accuracy of learnt coefficients, the relative errors |∆| < 3% for the Hindmarsh-Rose dynamics (Fig. 3a ) and the edge dynamics (Fig. 3c ) on both synthetic and empirical networks. In Kuramoto dynamics and coupled heterogeneous Rössler oscillators the self-dynamics are nonidentical, i.e. each node's dynamics has its own form (see Supplementary Information Section III). Hence we aim to infer an effective form of Eq. (1) that minimizes the inconsistency between inferred and true nodes' activities. Even for these more challenging cases, the two-phase approach still succeeds with relative coefficient errors |∆| < 5% or < 20% (Fig. 3(e,g) ). Both activities and trajectories generated by the effective equations exhibit high agreement with the true averaging dynamics ( Fig. 3(f,h,i) ). Whether a network dynamics is inferrable depends on several factors. Here we explore three key factors, namely synchronized dynamics, dynamical heterogeneity, and deficient libraries. Synchronized dynamics. If a network is completely synchronized, i.e. all of its nodes behave in the same manner 19, 34, 35 , distinguishing the activities of a node and its neighbors becomes impossible and the microscopic interacting mechanism G(x i , x j ) between nodes will be cloaked and undiscoverable. In other words, more synchronized is a network, more difficult to infer its dynamics. Here we tune the coupling strength between nodes to change the degree of network synchronization (i.e. the order parameter R , see Supplementary Information Section IV), and test the capability of our two-phase approach in inferring partially synchronized network dynamics. As shown in Fig. 4a , although the inference inaccuracy increases when the system becomes more synchronized, our approach still can infer the true FHN equation even when the network is highly synchronized ( R ≈ 0.7). The inference inaccuracy is quantified by symmetric mean absolute percentage error (sMAPE, see Methods). The more accurate the inference result, the closer the sMAPE value to zero. Dynamical heterogeneity. Equation (1) assumes that nodes have the same form F of self-dynamics, yet this is not always true. For instance, although the self-dynamics of Kuramoto model is simply one elementary function ω representing the natural frequency of a node, different nodes can have different values of ω. For such nonidentical self-dynamics it is difficult, if not impossible, to infer a specific form F i (x i ) for each node i due to an n-fold increase in the dimensionality of potential model space (n is network size). Therefore we aim to infer an effective equation that best captures the averaging dynamics, as shown in Fig. 3 (e,g). Here we further explore to what extent of dynamical heterogeneity our approach can tolerate. To do so we assign each node a value of ω randomly drawn from a normal distribution N (0, σ) and increase the standard deviation σ. The inference inaccuracy indeed increases when σ becomes larger, and the two-phase approach can tolerate dynamical heterogeneity σ ≤ 0.5 (Fig. 4b ). Deficient libraries. Although two rather comprehensive libraries of elementary functions are built, it is still possible that some elementary functions of the true unknown dynamics are missing. Another possibility is that the compact form of true dynamics cannot be composed by these elementary functions. For these cases, our two-phase approach will infer an alternative equation to capture the system behaviors. We test such capability in gene regulation and Hindmarsh-Rose neuronal dynamics whose true coupling functions are intentionally removed from L G . As shown in Fig. 4c , the trajectories generated by the inferred and the true equations are close to each other, and the discrepancy is small for all nodes (see Methods and also Supplementary Information Section IV-B). Incompleteness of mapped network topology and noises of observed nodes' activities are inevitable in real data 30,31 . Hence, here we validate the robustness of our two-phase approach against low resolution, dynamical and observational noises, spurious and missing links, as well as through comparisons with previous methods 23, 36, 37 . Low resolution. Experimental and digitally recording technologies often have limited measurement frequencies, inducing low resolution of observed time series. To validate our approach's robustness against low resolution we numerically simulate the five nonlinear network dynamics in Figs. 2 and 3 with step size 0.01, and then regularly down-sample the activities data. We calculate the failure ratios in inferring the form of true equations (Supplementary Figure 14a) and also the inference inaccuracies (Fig. 5a) . The results show that the two-phase strategy requires only a proportion of 5% to 50% data for the inference. Observational and dynamical noises. Observational noises are induced by the measuring process and dynamical noises represent the intrinsic stochasticity in dynamics. To produce the former we add Gaussian noises to nodes' activity data and quantify the intensity of observational noise with signal-to-noise-ratio (SNR, see Supplementary Information Section V-A); To imitate the latter we add a stochastic term of Gaussian white noise with intensity η into the true dynamical equations and generate the nodes' activities data by numerical simulations of these stochastic differential equations (see Supplementary Information Section V-A). We test the impact of these two types of noises on the performance of the two-phase inference approach, without any denoising preprocess. As shown in Fig. 5b and Supplementary Figure 14b , the approach can tolerate dynamical noise with η ≤ 0.15, meaning that it successfully reconstructs the hidden equations when the stochastic intensity is not higher than 15% of the average amplitude of true deterministic dynamics. Moreover, the approach can tolerate 30-dB observational noise ( Fig. 5c and Supplementary Figure 14c ). Spurious and missing links. Spurious and missing links in real data induce an incomplete network topology A ij , which further lead to an inaccurate interaction matrix Θ G . To test the impact of these erroneous links we randomly add or remove a fraction of links from the true network topology that was used to simulate the nodes' activities. Owing to the topological sampling in Phase II, our approach is able to tolerate 25% spurious and 30% missing links Comparison with previous methods. Two most illuminating and effective methods for dynamics inference are SINDy 23 and ARNI 37 . Note that ARNI originally aimed at inferring network topology but can be transferred to infer network dynamics by minor modification (see Supplementary Information Section V-C). Here we compare our approach with SINDy and ARNI from different aspects, including the amount of required data (Fig. 5f ), the robustness against observational noise (Fig. 5g) , correlated dynamical noise ( Fig. 5h and also Supplementary Information Section V-A), missing links (Fig. 5i) , and different network sizes (Fig. 5j) . While ARNI needs fewer data points if network topologies are complete and nodal activities do not have any noise (Fig. 5f ), the two-phase approach outperforms both SINDy and ARNI in inferring complex network dynamics from incomplete and noisy data ( Fig. 5g-j) . We also perform comparisons with SINDy's variant 36 regarding partially synchronized or heterogeneous dynamics (Supplementary Figures 13 and 17) . These results indicate that our approach can better handle high-dimensional networked systems and better cope with incompleteness and noises in data. Ablation studies. Besides the two-phase strategy, our approach also involves three important components, namely, normalization in the first phase yet non-normalization in the second for solving the issue raised by highly skewed observations at different nodes, topological sampling for imitating the feature of observed incomplete topologies, and optimal selection by wAIC for determining the most appropriate complexity of inferred dynamics. The essentiality of the two-phase strategy and the three components is demonstrated by ablation studies. Specifically, we ablate each phase or component and then assess the performance of degenerated approaches. As shown in Fig. 5 (k,l) and Supplementary Information Section V-B, the inference inaccuracy sMAPE indeed significantly increases if the phases or components are individually ablated. To demonstrate the approach's ability of handling empirical systems, we apply it to infer the spreading dynamics of infectious disease H1N1. The network underlying this diffusion system is the worldwide airline network, which captures the human mobility between different countries or regions and plays a dominant role for global disease spreading 5, 6 . Each entry A ij of the weighted network's adjacency matrix A represents the traffic volume from node j to i, where each node denotes a country or region. The total passengers daily are approximately Φ = 8.9 × 10 6 and, taking into account the population P i of each node i, the adjacency matrix is modified to The magnitude order of entries in matrix is around 10 −2 to 10 −3 . The nodal activities x i (t) are extracted from the daily reports of infected cases in each country or region. Here we consider the nodes whose accumulated H1N1 cases are more than 100 and focus on the early spreading dynamics, i.e. within the 45 days since the first case was reported in each node, which captures the system behaviors before government control. Based on these empirical data, our approach successfully infers a concise effective dynamical equation where a = 0.074 and b = 7.130 (see Supplementary Information Section VI and Supplementary Figure 18 ). It is interesting that our approach infers a sigmoid (nonlinear) form, rather than the linear form of epidemic models, to better capture the interaction dynamics. This might be caused by the fact that people usually consciously travel less if their countries/regions or the destinations have high infection risk. While Eq. 4 describes the dynamics of all nodes with the same parameters a and b, we also extend it by taking into account of dynamical heterogeneity in the nodes, i.e. to obtain a i and b i from each node i' activity data (see Fig. 6 (b-e) and also Supplementary Figure 18 ). Because empirical systems lack ground-truths, we verify the inferred Eq. 4 by testing its generalizability to SARS and COVID-19 diseases. Based on the daily reported numbers within the first 45 days in each node, we find that Eq. 4 is also able to capture the early spread of SARS and COVID-19 upon the worldwide airline network. Indeed, as shown in Fig. 6 (f-i) and Supplementary Figures 19-20 , the evolution of cumulative numbers of SARS cases (for nodes whose eventual infected cases are more than 100) and COVID-19 cases (for nodes whose eventual infected cases are more than 2000) agree well with the activities generated by Eq. 4 with heterogeneous parameters a i and b i . Many real networks have been mapped so far but there are still complex systems whose network structure information is totally missing. For the later cases, a possible scheme is inferring their topological structure, especially directed or causal networks 30,37-41 , from nodes' activities data first and then applying our approach to infer system dynamics. It is worth noting that inferring network structure from nodes' activities data is also challenging, especially when the number of nodes is large 42, 43 , because the number of parameters needing to be estimated is about n 2 where n is network size. Therefore, how to simultaneously infer both structure and dynamics of large complex systems is still an outstanding problem. Our work also raises several questions worthy of future pursuit. First, stochasticity in the dynamics of some real complex systems might be stronger than that we considered in this work. Such highly stochastic systems are better described by stochastic differential equations 29, [44] [45] [46] . Second, our approach does not account for discrete or boolean dynamics, or systems that contain thresholding terms or exhibit irregular dynamics with instability properties 47 . Third, when nodal activity is multi-dimensional, experimental access might be limited to a sub-dimension of the activity vector. The Koopman operator and time-delay embedding techniques are helpful for capturing the dynamical properties of sub-dimension observable systems 48 . Yet, the problem remains unsolved for complex networked systems. Finally, the nodes in a complex system can have higher-order, beyond pairwise, couplings and such higher-order interactions may significantly impact the dynamics of networked systems 49, 50 . Hence it is an interesting direction to extend the approach to inferring higher-order network dynamics. The left side of Eq. (1) represents the time-varying derivative of each node's activity, which can be numerically obtained from x i (t) through the five-point approximation 51 where δt is the time step. Hence the specific goal is to infer both the exact structure and the corresponding coefficients of the self-dynamics function F(x i (t)) and the interaction dynamics function G(x i (t), x j (t)). Because we lack a priori knowledge of the forms of F and G, hence we construct two comprehensive libraries, L F and L G , for self-and interaction dynamics respectively, including polynomial, trigonometric, exponential, fractional, rescaling, and various activation functions as listed in Supplementary Tables 1 and 2 . Introducing the observed time series of nodes' activities to the elementary functions in L F and L G , we obtain two matrices Θ F (t) = L F (x i (t)) and Θ G (t) = L G (x i (t), x j (t)) that describe the corresponding behaviors of these elementary functions (Supplementary Figure 1) . To infer the compact forms that best match Eq. (2) we propose a two-phase approach. Phase I, global regression. The purpose of this phase is to assess the relevance of each elementary function in L F and L G to the true, yet unknown, network dynamics. Given the observations of x i (t) for all i at time t we approximate the derivativesẋ(t) and calculate the matrices Θ F (t) and Θ G (t). These values are highly skewed and can span several orders of magnitude (Supplementary Figure 3) due to the skewness of node degrees and the nonlinearity of system dynamics, which could induce overestimation of the importance for inherently low-value constituents. To eliminate this severe effect it is crucial to normalize each column inẋ(t), Θ F (t) and Θ G (t). Then, the inference problem described by Eq. (2) is further recast to an optimization formula where λ > 0 is a hyper-parameter that regulates the sparsity of coefficient vectors ξ F and ξ G . We employ the regression analysis method of least absolute shrinkage and selection operator (i.e. lasso) to solve (6) and perform 5-fold validation to obtain the most appropriate value of λ (see Supplementary Information Section I-B Algorithm 1). The resultant ξ F and ξ G capture the relevance of each elementary function in L F and L G , enabling the identification of leading elementary functions that are most probably constituents of the true F and G (Fig. 1c) . Consequently, Phase I is able to significantly narrow down the model space. However, the dynamical equation inferred by such regression alone lacks generative power. For instance, as shown in Fig. 1d , the trajectory generated by an inferred dynamical equation of Phase I deviates from that of the true network dynamics. Phase II, local fine-tuning. In order to reconstruct generative and concise expressions for F and G we next perform fine-tunning in the reduced model space (see Supplementary Information Section I-B Algorithm 2). In contrast to Phase I we now use the original values ofẋ(t), Θ F (t) and Θ G (t), i.e. without normalization, to further identify the necessary elementary functions and learn their precise coefficients. Since spurious or missing links in the observed network topology have an adverse effect on the learning, we perform topological sampling (see Methods) that imitates the feature of observed, usually incomplete, topologies. Another issue is to determine the minimal number of elementary functions required for reconstructing F and G. To do so, we sequentially remove the elementary functions with smallest inferred coefficients and calculate, using a weighted version of Akaike's information criterion (wAIC, see Methods), the information inconsistency between the observed nodes' activities and the remaining set of elementary functions. This process stops when removing a certain elementary function consistently increases the value of wAIC. As shown in Fig. 1e , each curve in a plot at the left column represents the information inconsistency vs. model complexity for one topological sample. And we find that, indeed, the joint operation with wAIC and topological sampling are helpful for inference from noisy and incomplete data ( Fig. 5(k,l) ). The final sets of elementary functions and their coefficientsξ F andξ G compose the formsF andĜ, leading to the successful inference of network dynamics described by Eq. (1) . Indeed, as demonstrated in Fig. 1f , the trajectory generated by the inferred dynamical equation agrees well with the numerical simulations of the true network dynamics. It is worth noting that the ground truth, i.e. the form of the true equation, keeps unknown during the whole procedure and is only used to assess the accuracy of the final inferred results, hence our approach works in an autonomous unsupervised way. The original Akaike's Information Criterion (AIC) 52 is a frequently used method to balance the fitting and the complexity of a model with respect to the observed data, defined as AIC = n log MSE + 2p, where n is the number of observations, MSE is mean squared error of regression result of the model, and p is the number of variables. By AIC, one aims to select an optimal model that best fits the observations with the fewest variables from the model candidates. However, we find that the original AIC does not work well in the inference problem we aim to solve in the present work. Hence we introduce a weighted version of AIC (i.e. wAIC), to balance the fitting accuracy and the model complexity, where w is the inferred coefficient of a term from Phase I. A term with a larger w inferred by Phase I is more likely to be able to capture the properties of the underlying unknown dynamics. Thus, multiplying w or 1/w with AIC amplifies the impact of removing this term from the equation. The smaller the wAIC, the more consistent is the composition of the elementary functions with observed data and less important is this removed term. To be specific, to evaluate the relevance of a term i, we remove this term from the equation inferred by Phase I and calculate the value of wAIC i of the new shorter equation (Supplementary Figure 2) . We repeat this process to obtain the wAIC for each term. Then we sort these terms based on their wAIC values, and remove terms one by one with wAIC values from small to large. This operation gives a shorter equation at each step, and we calculate these shortened equations' AIC values. The optimal equation is determined at the tuning point where the curve starts consistently increasing, as indicated by pink stars in Fig. 1e . We perform topological sampling in phase II as follows. We randomly choose S nodes from all n nodes, and obtain the activities of these S nodes' partial neighbors. Introducing the sampled ego structures and nodes' activities into libraries L F and L G allows us to construct the self and interaction matrices Θ F and Θ G , and to further distill the elementary functions as well as their coefficients. We repeat the process to obtain K sets of samples, average the coefficients of the elementary functions inferred from the K sample sets. In the present work we set S = 10 and K = 20. The inference inaccuracy is quantified by symmetric mean absolute percentage error (sMAPE) 53 , where m is the cardinal number of the set that contains both inferred and true elementary functions, I i and R i are inferred and true coefficients respectively. The range of sMAPE is [0, 1]. The more accurate the inferred equation, the lower the value of sMAPE. Note that if an inferred elementary function should not exist in the true equation or a true elementary function is not successfully inferred, the value of sMAPE will significantly increase. Therefore, the metric sMAPE captures not only the errors of inferred coefficients but also the incorrectness of the inferred equation form. To evaluate the discrepancy between the inferred and true dynamics, we used the metric of normalized Euclidean distance (NED) that represents the distance between the two trajectories generated by the inferred and the true dynamical equations respectively. That is, Here x i is the true trajectory andx i is the trajectory generated by the inferred equation, t 0 and T are the beginning and ending times respectively, and D max is the longest Euclidean distance between a pair of points of true trajectory. The empirical networks data include C. elegans connectome 54-56 , the mushroom-body region of Drosophila 57 , the North-Europe power gird 58 , the U.S. power grid 59 , Advogato social network 60 retrieved from website http://networkrepository.com, and the worldwide airline network retrieved from OpenFights data (https://openflights.org/data.html). The empirical data of epidemic spreading include daily reported numbers of H1N1 and SARS cases available at Kaggle (https://www.kaggle.com/lnunes/a-brief-comparativestudy-of-epidemics/data), and the daily reported numbers of COVID-19 cases 61 . Source Data are available at the Code Ocean capsule 62 . All source codes are publicly available at the Code Ocean capsule 62 . -12 true inferred by phase I + II Figure 1 : Overview of the two-phase inference approach. a. Observation data of network topology Aij, including spurious and missing links, and low-resolution and noisy data of nodal activities xi(t). b. Mapping the normalized observation data into two matrices ΘF and ΘG that represent the time-varying patterns of elementary functions. c. Phase I that narrows down the model space by identifying several leading elementary functions through global regression for each dimension ofẋi(t). d. Comparison of trajectories generated respectively by the true network dynamics and by the dynamical equation inferred by Phase I alone. e. Phase II that performs local fine-tuning, by using topological sampling and weighted Akaike's information criterion (wAIC), to further determine the optimal number (indicated by pink stars) of elementary functions forF(xi(t)) andĜ(xi(t), xj(t)). f. Comparison of trajectories generated respectively by the true and the inferred dynamical equations. The example illustrated in (c-f ) is Hindmarsh-Rose neuronal dynamics on a directed BA network with size n = 100, average degree k = 5. Phase I c Phase II a. True FitzHugh-Nagumo (FHN) dynamics used to simulate nodes' activities data on various topologies. F d and G d are self-and interaction dynamics of the d-th dimension respectively, x i,d is d-th dimension's state of node i, and x p i,d is the polynomial with order p. b. Ten leading elementary functions identified by Phase I for each dimension. c. The necessary elementary functions and their coefficients further inferred through Phase II on two synthetic (directed ER and undirected SF) and one empirical (Drosophila mushroom body) networks, where g FHN denotes the term (xj − xi)/k in i . d. Relative errors ∆ of the inferred elementary functions and their coefficients. Note that the elementary functions ruled out from ΘF and ΘG by our approach (i.e. whose coefficients are inferred as zero) are not shown. e,f. Nodes' activites and trajectories generated respectively by the true and the inferred equations. ). c,d. Relative errors and six edges' activities of the inferred edge dynamics of social balance. e-i. Relative errors of the inferred effective equations for network dynamics of Kuramoto model and coupled Rössler oscillators. In both cases the self-dynamics are heterogeneous, i.e. the intrinsic frequency of each node is not identical but follows a normal distribution N (1, σ) with σ = 0.1. Grey curves represent the activity of individual nodes and black curves represents the averaging activity of systems. Symbols g kura and g ross denote terms sin (xj − xi) and (xj − xi) respectively. The details of these dynamics and empirical networks are shown in Supplementary Tables 3 and 4 . Figure 5 : Inference robustness against incompleteness and noises. a-e. Inference inaccuracies sMAPE when the nodes' activities data are low-resolution, have dynamical noises (Gaussian white noise with intensity η) or observational noises (intensity quantified by signal-to-noise ratio SNR) or when the topologies data have spurious and missing links. f-j. Comparisons of inference inaccuracies between SINDy, ARNI and our approach for inferring HR neuronal network dynamics, with varying amount of time-points, observational noise, correlated dynamical noise, missing links, and different network sizes. Simulation details are shown in Supplementary Table 4 . k, l. Comparison results of ablation studies. The box-whisker plots are visualized with the Tukey method (i.e. the box represents the interquartile range (IQR) and the line in the box shows the median, with whiskers that extend 1.5 times the IQR from the box edges; The outliers are also plotted) and the sample size is 20. Five ablation studies were performed: 1 ○ removing topological sampling, 2 ○ using original AIC instead of wAIC, 3 ○ removing Phase II, 4 ○ removing Phase I, or 5 ○ without normalization to ΘF and ΘG. Statistical significance is obtained through multiple Mann-Whitney tests. Three or four asterisks indicate p-value < 10 −3 or 10 −4 , and n.s. means not significant. igure 6: Inference of early spreading dynamics from empirical data. a. Worldwide airline network (partial) used for the inference. Each node represents a country or region, and line thickness represents the amount of passenger flow. The form describes the dynamical equation inferred by the two-phase approach. b-e. Comparisons between the empirical cumulative number of H1N1 cases for different nodes (dashed lines) and the cumulative number generated by the inferred equation (solid lines). For better visualization, the comparisons are displayed in four plots (from b to e) and the dates when the first case is reported for each node are all shifted to the first day (t = 1). f-i. Comparisons between the empirical and the inferred cumulative numbers in different nodes for SARS (f, g) and COVID-19 (h, i). Phase I. As shown in Supplementary Figure 3 , the values of the terms (i.e. elementary functions) in libraries L F and L G can span several orders of magnitude. Such a wide spread has detrimental effects on evaluating the relevance of each term to the true unknown dynamics. For example, if a term is inherently low-valued, a regression method will probably assign a relatively large coefficient to it, which might indicate the term is more relevant. However, this is not always true since it could be caused by the fact that the scale of this term is inherently smaller than that of other terms. To remove such detrimental effect, we hence do not use the original data but rather perform normalization for the derivativesẋ and each column of the matrices Θ F and Θ G . To be specific, the normalization of the i-th column of the matrix Θ is Constructing self-dynamics matrix ΘF,i=3 by mapping x3(t) to library LF that contains m1 elementary functions. c. Constructing interaction dynamics matrix ΘG,i=3 by mapping x3(t) and {xj(t)}, where j is in the set of node i's neighbors, to library LG that contains m2 interaction elementary functions. Through the Kronecker product I d of the identity matrix and ΘF,i=3 (or ΘG,i=3), we obtain matrices ΘF,i=3 and ΘG,i=3 capturing the whole d-dimensional nodes' activities on all elementary functions. Hence the optimization is to search for ξF,i=3 and ξG,i=3 that best solve the equation exhibited in d. Supplementary Figure 2 : An example for the calculation of wAIC. To evaluate the relevance of the term xi,1 to the true dynamics, we remove this term from the set of leading terms inferred by Phase I and then calculate the MSE1 between the dynamics composed of the remaining p terms and the observation data. Then we obtain the value of wAIC1 that represents the relevance of this removed term (the higher the more relevant). and the normalization of the d-th dimension of the derivative is Therefore Phase I aims to obtain the coefficients ξ m iṅ where m 1 and m 2 are the numbers of terms in libraries L F and L G respectively. The pseudocode of Phase I is described in Algorithm 1. Algorithm 1 Phase I Input: A: network topologẏ x: numerical derivatives Θ F , Θ G : libraries' matrices, for self-and interaction dynamics respectively κ: the number of leading terms to be identified Output: L * F , L * G : reduced libraries for self-and interaction dynamics respectively w: weight vector 1: # data normalization 2: normalizing of each column ofẋ, Θ F and Θ G ; 3: # global regression 4: minimizing of formula (3) in main paper via lasso to obtain ξ * F , ξ * G , and intercept value c; 5: # Judging whether the constant term c should exist 6: Temporally construct two reduced libraries: one contains only the κ leading terms with the largest absolute coefficients ξ * , the other contains the κ leading terms and the constant term; 7: compute AIC w.c. and AIC wo.c. for libraries with and without the constant term respectively. 8: if AIC w.c. − AIC wo.c. > β 1 AIC w.c. # β 1 : a threshold for the judgement 9: the constant term is necessary; 10: else 11: the constant term is unnecessary; 12: end By Phase I, the comprehensive libraries containing a large number of elementary functions that potentially compose the self-and interaction dynamics are significantly narrowed down to L * F and L * G . Because, for most systems of interest, the underlying dynamical equation often consists of only a few terms, the reduced libraries L * F and L * G contain top κ terms with the largest absolute coefficients described by the weight vector w. To determine whether the hidden equation should contain a constant term, we compare the AIC values of the equations with and without constant. If the AIC value of the equation with constant is far smaller than that without constant, the hidden equation probably has a constant term. In the present paper we set κ = 10, i.e. obtain top ten leading elementary functions for L * F and L * G by algorithm I. Note that the hyper-parameter λ in Phase I is automatically selected by 5-fold cross validation, a widely-used trick to estimate the most appropriate value of hyper-parameter. Specifically, to choose the best λ from a set {λ 1 , λ 2 , · · · , λ r }, we parameterize the loss function (L(λ 1 ), L(λ 2 ), · · · , L(λ r )), and randomly divide the nodes' activities data into five sets, with one set for test and the other four sets for training. We apply each value of λ and calculate the corresponding score of lasso regression. Finally the hyperparameter with the highest regression score is selected as the optimal value of λ [S1] . Phase II. Through Phase I we have obtained a reduced model space that represents the set of most relevant elementary functions. With the reduced libraries L * F , L * G at hand, next we propose Phase II to infer a concise form as well as appropriate coefficients for the hidden dynamical equation. The procedure of Phase II has two key sub-steps, i.e. topological sampling, and fine-tuning with wAIC (see Methods). The pseudocode of Phase II is described in Algorithm 2. Algorithm 2 Phase II Input:ẋ: numerical derivatives L * F , L * G : the reduced libraries by Phase I w: weight vector obtained by Phase I Output:L F ,L G : the final inferred terms for self-and interaction dynamics respectivelŷ ξ F ,ξ G : the final inferred coefficients for self-and interaction dynamics respectively 1: extract the reduced matricesΘ F andΘ G from Θ F and Θ G based on L * F and L * G ; 2: # topological sampling 3: for i = 1; i < K; i + + # K: number of topological samples # assessment with wAIC 7: for j = 1; j <= κ; j + + 8: remove the term j 9: compute AIC j for the combination of terms without term j; 10: wAIC j ← w j · AIC j ; end 19: end 20: obtainL F andL G by merging the identified terms of all K topological samples; 21: obtainξ F andξ G by averaging the inferred coefficients across all K topological samples; In the present paper we used both synthetic and empirical networks to validate our two-phase inference approach. The synthetic networks are generated by directed Erdős-Rényi (ER) model [S2] for random topologies, as well as Barabási-Albert (BA) [S3] and the static model [S4] for scale-free (SF) topologies. The empirical network datasets are obtained from neuronal, social and technological domains. Erdős-Rényi (ER) networks are generated by erdos renyi graph(n, p) in networkX [S5] , where n = 100, p = 0.05. The resulting average degree k = 5.2. For each link between node i and j, we assign the link direction, i.e. from i to j, from j to i, or a reciprocal connection between them, with equal probability. Directed SF networks are generated by Barabási-Albert model, i.e. using barabasi albert graph (n, m) in networkX [S5] , where n = 100, m = 5. We set each link bidirectional, and then randomly delete a proportion of these unidirectional links while ensuring that the network is weakly-connected. Finally we obtain a directed network with average total-degree k = 5.1. Directed scale-free networks are generated by the static model [S4] . The procedure is as follows: we assign a weight ω i = i −η to the node with index i, where i = 1, 2, · · · , n and 0 < η ≤ 1, then randomly select a pair of nodes i and j (i = j) with probability proportional to ω i and ω j respectively. If there the link from node i to j already exists, we randomly select another pair of nodes with proportional to their weights; Otherwise, we connect node i to j. Repeat this process until the number of links reaches mn. Hence the average total-degree k = 2m, and the power-law exponent γ = 1+η η which can be tuned by η. The empirical networks used in this work include C. elegans connectome with 279 neurons [S6] - [S8] , the mushroom-body region of Drosophila with 1832 neurons [S9] , the North-Europe (N.E.) power grid with 236 nodes [S10] , the U.S. power grid with 4941 nodes [S11] and a community in the weighted social network of Advogato with 623 nodes [S12] . The properties of these empirical networks are described in Supplementary Table ( 3). In this paper we demonstrated the effectiveness of our two-phase approach in inferring a wide range of complex network dynamics pertaining to brain, social and coupled nonlinear oscillators systems. Neurons exhibit spiking activities which is believed to be an essential component in information processing in brains [S13] . To simulate such activities we employed the 3-dimensional Hindmarsh-Rose (HR) model [S14] , a moderately simplified version of the HodgkinšCHuxley [S13] model. The true equations governing the HR network dynamics are where the coupling term Here x i,1 is the membrane potential of neuron i, x i,2 is the transport rate of sodium and potassium ions across the membrane through the ion channels, and x i,3 is the adaptation current. The x i,1 is a simplified notation for x i,1 (t). Parameters a = 1, b = 3, c = 1, u = 5, s = 4, r = 0.005, x 0 = −1.6, coupling strength = 0.15, V syn = 2, λ = 10, Ω syn = 1, and I ext is external current which is set to 3.24 for all neurons. The coupling strength is tunable according to size and topology of networks (see sec. IV). Applying our approach to the neuronal activities data generated by HR dynamics on a directed scale-free network, we obtain the inferred equations typically as where g HR Note that all true elementary functions have been successfully inferred, and inaccuracies of their coefficients are lower than 3% (as also shown in Fig. 3a in the main paper). The neuronal activities and trajectories generated by the true Eq. (S12, S13) and the inferred Eq. (S14) are shown in Supplementary Fig. 4 . We also tested our approach by applying it to the neuronal activities data generated by FitzHugh-Nagumo (FHN) dynamics [S15] on various networks. The equations governing the FHN neuronal network dynamics True Inferred NED=0.08 Supplementary Figure 4 : Neuronal activities and trajectories generated by the true HR network dynamics and the inferred dynamical equations. The FHN dynamics capture the firing behaviors of neurons with two components. The first component represents the membrane potential containing self-and interaction dynamics, where k in i is the in-degree of neuron i, and = 1. The second component represents a recovery variable where a = 0.28, b = 0.5 and c = −0.04. The equations inferred by our approach from the neuronal activities data generated on a directed scalefree network are shown as Eq. (S16). The activities and trajectories generated by the true Eq. (S15) and the inferred Eq. (S16) are shown in Supplementary Fig. 5 . We also demonstrated the effectiveness of our approach in inferring gene regulation (GR) dynamics. We generate the gene activities data according to the equation [S16] dx i,1 dt where x i,1 (t) is the concentration of gene i at time t, α = 2 denoting the Hill coefficient, and b = −0.2. Parameters ij denotes the regulation strength of gene j on gene i which is set as ij = 0.1 or 0.05 on different networks. The dynamical interaction equation inferred by our approach from the data generated on an undirected scale-free network with ij = 0.05 is The interaction function is automatically inferred from library L G that contains activation function { x γ x γ +1 }. The activities and trajectories generated by the true and the inferred equations are shown in Supplementary Fig. 6 . We validated the two-phase inference approach for two types of coupled nonlinear oscillators, in which the self-dynamics are heterogeneous. The first one is Kuramoto network dynamics [S17] , [S18] : It describes the dynamics of n oscillators with natural frequencies following a normal distribution ω ∼ N (1, σ) with σ = 0.1, and the oscillators are coupled via the links A ij . Parameter denotes the coupling strength of the interactions. The equation inferred by our approach from the activities of oscillators generated on a directed scale-free network with = 0.025 is Another case for coupled nonlinear oscillator system is Rössler network dynamics [S19]- [S21] . We first generated chaotic activities data according to the true equations governing heterogeneous Rössler dynamics where coupling strength = 0.15 or 0.2 on different networks, a = 0.2, b = 0.2, and c = −6. Oscillators' natural frequencies follow a normal distribution ω ∼ N (1, σ) with σ = 0.1. The equations inferred by our approach from the data generated on a directed scale-free network with = 0.15 are where the coefficient 1.011 is the inferred effective frequencyω (i.e. the average value ω of nodes' natural frequencies). The activities and trajectories generated by the true and the inferred dynamical equations are shown in Supplementary Fig. 8 . We also generated homogeneous Rössler dynamics according to the true equations where = 0.1, a = 0.35, b = 0.2 and c = −5.7. The equations inferred by our approach from the data generated on a directed scale-free network are To illustrate the generative power of the inferred equations obtained by our approach, we set different initial states to validate if the inferred equations can reproduce the dynamics of the original model. The results in Supplementary Fig. 10 show that the inferred equations indeed resemble the dynamics of original models. It is worth mentioning that although our approach infers the true equations with high precision, the distance between the inferred and the true trajectories increases for longer time due to the chaoticity of the original model. We also tested our approach with a social balance dynamics described by edge dynamical equation [S33] , i.e. the relation between two individual nodes is represented by x ij . Entry x ij represents the strength of trust or distrust between i and j. The strength of trust or distrust between nodes evolves according tȯ We rewrite Eq. (S25) explicitly in terms of entries x ij as The edge activities generated by the true equation are shown in Supplementary Fig. 11 . To infer such a equation, we construct libraries by a variety of functions with temporal topologies data X(t), for example, X 2 , X 3 , sin(X), etc. By our approach, we automatically infer the exact term and its coefficient, leading to the equation To ensure the reproducibility of our results we list the parameters of dynamics and networks for each plot in the main paper. The simulations of dynamics are from time 0 to T with step-size δt. To quantify the degree of synchronization of a dynamical network, the sequence of peaks in x i (t) is extracted for defining the geometric phase θ i (t) of each node i. For example, in the neuronal spiking activities data, we use a Poincaré's section at x i = 0.5 to determine the time of the start (upward sense) and the end (downward sense) of a spike (Supplementary Fig. 12(a,c) ). Between the l-th spike and (l + 1)-th spike, the phase θ i (t) is increased by 2π, such that an interpolation between the two spikes can define the continuous time-varying phase as [S23] , [S24] where t is the current time, and t l,i is the time at which node i starts the l-th spike. The degree of synchronization of the dynamical network is then quantified by order parameter When R(t) is close to zero, the dynamical network is totally disorder; If R(t) = 1, the activities of all nodes are completely synchronized, indicating a complete phase synchronized state. Averaging R(t) over time we obtain a single order parameter to quantify network synchronization, i.e. where t 0 and t f are the start and the end of the time range respectively. A network usually becomes more synchronized when the coupling strength increases. It is worth noting that, for some network dynamics, such as FHN, the nodes can evolve into two categories, i.e. get trapped in two fixed points ( Supplementary Fig. 12e ). For this case, the order parameters of the two categories are calculated respectively and the average of them is considered as the order parameter of the whole network. If the hidden true dynamics is not a combination of elementary functions or if some elementary function of the hidden dynamics is not included in the large libraries L F and L G , our approach will infer alternative forms of F and G that are still able to characterize the observed system behaviors. To demonstrate such capacity, we deliberately remove some elementary functions that should exist in the true network dynamics. For example, we delete interaction terms xi,1 1+e −α(x j,1 −β) and 1 1+e −α(x j,1 −β) (α = 10, β = 1), yet to infer HR network dynamics with r = 0.004 and x 0 = −1.5. Even for such deficient libraries our approach infers alternative equations that still capture the HR network dynamics (see Fig. 4c in the main paper), as well as the alternative equation to capture GR network dynamics (see also Fig. 4c ). In this subsection we compare our approach with SINDy [S23] and its variant [S36] regarding inferrability, i.e. the capacity of dealing with partially synchronized networked dynamics or dynamical heterogeneity. The results showing in Supplementary Fig. 13 indicate that our method is better at coping with synchronization and dynamical heterogeneity. The network properties and dynamics settings are the same as that in Fig. 4 of the main paper. In this work we validated the robustness of our approach against five types of unavoidable uncertainties in observation data, including observational, uncorrelated and correlated dynamical noises in data of nodes' activities, missing and spurious links in observed topologies, low resolution by experimental techniques. To construct incomplete topologies we randomly add or delete a proportion of entries in the true adjacency matrix A ij ; To imitate the constraint of low resolution we regularly down-sample the time series of nodes' activities. In the following we describe in detail the simulations of data with observational, dynamical and correlated noise, and show the failure ratio of inferring the structure of true elementary functions There are usually two types of noises in time series data, i.e. dynamical noise and observational noise [S27] . The latter is raised in the observation or measurement process, while the former indicates the inherent stochasticity of system dynamics. To test the robustness of our approach against these two types of noises, we simulate the data through [S27] where ηA · dW i (t) is dynamical noise, and aβ X (t) is observational noise. Here dW i (t) is the Gaussian white noise, which follows a normal distribution with mean zero and standard deviation √ dt, and A is the average amplitude of the original time series. Here η represents the intensity of dynamical noises relative to the signal amplitude. We simulate Eq. (S33) by using the fourth-order Runge-Kutta method with fixed time step size. The β X (t) is a Gaussian noise following a normal distribution with zero mean and standard deviation one, hence the intensity of observational noise can be tuned by the value of a. In the paper we generate observational noisy data by awgn in MatLab and characterize the noisy data with specified Signal-to-Noise-Ratio (SNR). Correlated dynamical noise could exist in complex systems when the noises are correlated across different nodes (such as in neuronal activities [S28] , [S29] ). Then the network dynamics become where = { i } is correlated noise across all nodes drawn from a multivariate normal distribution with mean 0 and covariance Σ, i.e. ∼ η N (0, Σ). The noise covariance matrix Σ is semidefinite which is constructed by Σ = AA , where A is adjacency matrix, and η is the strength of correlated noise. In the main paper we also validate the robustness of our approach against such correlated noises (Fig. 5j) . In the main paper we showed that each key sub-step of our two-phase approach is indispensable for inferring the hidden network dynamics from noisy and incomplete data. Here we further show that, even for clean data these key sub-steps are also necessary. To do so we individually remove each sub-step and test the performance of the corresponding deficient approach. The result of these ablation studies are shown in Supplementary Fig. 15 ○ removing the sub-step of topological sampling, 2 ○ using original AIC instead of wAIC, 3 ○ removing Phase II, 4 ○ removing Phase I, 5 ○ No normalization in ΘF and ΘG. Statistical significance is obtained by multiple Mann-Whitney test [S30] , [S31] . Three or four asterisks indicate p-value 10 −3 or < 10 −4 , and n.s. means not significant. increased inaccuracy. 3 ○ Phase II ablated: Without the local fine-tuning, Phase I alone is not able to obtain a compact form of the hidden equation. Moreover, as shown in Fig. 1d in the main paper, even though the equation inferred by Phase I alone fits the observational data well, it does not have generative power. ○ Phase I ablated: If Phase I is ablated from the approach, we will lose the global information that captures the coarse yet consistent structure of the hidden dynamics. Hence we find the ablation of Phase I also significantly increase the inference inaccuracy. ○ No normalization in Phase I: As shown in Supplementary Fig. 3 , the value of several elementary functions can span several orders of magnitude, possibly significantly larger than that of others. If the sub-step of normalization is removed the coefficients of these inherently high-value terms, obtained by regression will be very small, which decreases the possibility of the existence of these terms. In the main paper we show the results of the comparisons between our approach, SINDy, and ARNI. The original ARNI is a model-free inference approach for network structure reconstruction [S32] . ARNI uses the idea of matrix pseudoinverse to connect a node to other most relevant nodes by minimizing the cost between numerical derivatives of observed time series and the sum value of basis functions, enabling the inference of network topology from nodes' activities. We would like to emphasize that although ARNI originally aimed at inferring network structure, it can also be used to infer network dynamics if network structure is given. For a fair comparison, we transfer ARNI to infer network dynamics as follows. We apply the same idea of matrix pseudoinverse and add most relevant elementary functions one by one, as shown in Supplementary Fig. 16 , until the cost is lower than a threshold. To improve ARNI's ability of coping with the possible incompleteness in observed network structure, we also perform topological samplings in ARNI. The resulting elementary functions and their coefficients are the value averaged for multiple topological samplings. The details of ARNI modification can be seen in our code. We also perform robustness comparisons between our approach and SINDy variant [S26] against observational noises. The results are shown in Supplementary Fig. 17 , where the task is to inferring HR dynamics Supplementary In the main text we have demonstrated the applicability of our approach to inferring dynamical equations from empirical data. Specifically we used the global spreading data of H1N1 disease reported daily from April 24th to July 6th in year 2009. As we focused on the dynamics of early spreads before governments introduce quarantine policies, only the first 45 daily data were used for the inference. For instance, if the first case in a country was reported on May 1st, we employed the data from May 1st to June 14th. Note that, although the starting time of spread for different countries are different, in the coupling dynamics G(x i , x j ) the time t is the same date for all nodes. In other words, in Fig. 6 and Supplementary Figs. 18-20 the initial time t [/day] being shifted to t = 1 for all nodes are just for the convenience of visualization. The inference approach indeed used the original dates and data. The comparisons between the inferred and the empirical activities are also displayed in Supplementary Fig. 18 for H1N1, in Supplementary Fig. 19 for SARS and in Supplementary Fig. 20 for COVID-19. Supplementary Figure 20 : Comparison between empirical number of COVID-19 cases, the infected numbers generated by the inferred Eq. (5) in the main text with identical parameters a and b for all nodes (i.e. effective), and the infected numbers generated by the form of Eq. (5) yet with heterogeneous parameters ai and bi for each node i (i.e. heterogeneous). The parameter for effective dynamics are a = 0.040 and b = 105.160, and those ai and bi for heterogeneous dynamics are shown in the sub-plots respectively. Note that Eq. (5) was inferred based only on H1N1 data, hence the results in this figure show that the inferred equation is also able to capture the early spreads of COVID-19. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision Model-free reconstruction of excitatory neuronal connectivity from calcium imaging signals High-throughput sequencing technologies Advancements in next-generation sequencing The role of the airline transportation network in the prediction and predictability of global epidemics The hidden geometry of complex, network-driven contagion phenomena Mobility network models of covid-19 explain inequities and inform reopening The Structure and Dynamics of Networks Universality in network dynamics Dynamic patterns of information flow in complex networks Coupling functions: universal insights into dynamical interaction mechanisms Dynamic models of large-scale brain activity Predicting perturbation patterns from the topology of biological networks Continuous-time model of structural balance Exploring complex networks Synchronization in small-world systems Model selection for dynamical systems via sparse regression and information criteria Model-free inference of direct network interactions from nonlinear collective dynamics Detecting and quantifying causal associations in large nonlinear time series datasets Detecting causality in complex ecosystems Causal network inference by optimal causation entropy Reconstructing effective phase connectivity of oscillator networks from observations Regression dcm for fmri Estimation of directed effective connectivity from fmri functional connectivity hints at asymmetries of cortical connectome Stochastic dynamics as a principle of brain function Learning non-stationary langevin dynamics from stochastic observations of latent trajectories Inferring the dynamics of black-box systems using a learning machine Stable irregular dynamics in complex neural networks Discovery of nonlinear multiscale systems: Sampling strategies and embeddings The physics of higher-order interactions in complex systems From networks to optimal higher-order models of complex systems Numerical solution of stochastic differential equations in finance A new look at the statistical model identification A pragmatic view of accuracy measurement in forecasting The structure of the nervous system of the nematode caenorhabditis elegans Structural properties of the Caenorhabditis elegans neuronal network Network control principles predict neuron function in the caenorhabditis elegans connectome A connectome and analysis of the adult drosophila central brain How dead ends undermine power grid stability Konect: the koblenz network collection The network data repository with interactive graph analytics and visualization An interactive web-based dashboard to track covid-19 in real time A two-phase approach for inferring complex network dynamics Scikit-learn: Machine learning in Python On the evolution of random graphs Scale-free networks Universal behavior of load distribution in scale-free networks Exploring network structure, dynamics, and function using networkx The structure of the nervous system of the nematode Caenorhabditis elegans Structural properties of the Caenorhabditis elegans neuronal network Network control principles predict neuron function in the Caenorhabditis elegans connectome A connectome and analysis of the adult drosophila central brain How dead ends undermine power grid stability Konect: the koblenz network collection The network data repository with interactive graph analytics and visualization Dynamical principles in neuroscience Inference of topology and the nature of synapses, and the flow of information in neuronal networks Impulses and physiological states in theoretical models of nerve membrane Reconstructing nonlinear dynamic models of gene regulation using stochastic sampling A universal order parameter for synchrony in networks of limit cycle oscillators Network dynamics of coupled oscillators and phase reduction techniques Synchronization in complex networks Phase synchronization of coupled rössler oscillators: amplitude effect Connectivity influences on nonlinear dynamics in weakly-synchronized networks: Insights from rössler systems, electronic chaotic oscillators, model and biological neurons Continuous-time model of structural balance Mechanism for explosive synchronization of neural networks Synchronization in networks of hindmarsh-rose neurons Discovering governing equations from data by sparse identification of nonlinear dynamical systems Model selection for dynamical systems via sparse regression and information criteria Estimating the level of dynamical noise in time series by using fractal dimensions When and why noise correlations are important in neural decoding Modeling correlated noise is necessary to decode uncertainty The Mann Whitney Wilcoxon distribution using linked lists A computational approach to statistics Model-free inference of direct network interactions from nonlinear collective dynamics The authors declare no competing interests.