key: cord-0470758-bec84fiu
authors: Prasse, Bastian; Mieghem, Piet Van
title: Predicting Dynamics on Networks Hardly Depends on the Topology
date: 2020-05-29
journal: nan
DOI: nan
sha: 1a5da80edc94f307c03dd0593af105b51c9c2c1e
doc_id: 470758
cord_uid: bec84fiu

Processes on networks consist of two interdependent parts: the network topology, consisting of the links between nodes, and the dynamics, specified by some governing equations. This work considers the prediction of the future dynamics on an unknown network, based on past observations of the dynamics. For a general class of governing equations, we propose a prediction algorithm which infers the network as an intermediate step. Inferring the network is impossible in practice, due to a dramatically ill-conditioned linear system. Surprisingly, a highly accurate prediction of the dynamics is possible nonetheless: Even though the inferred network has no topological similarity with the true network, both networks result in practically the same future dynamics.

The interplay of dynamics and structure lies at the heart of myriad processes on networks, ranging from predator-prey interactions on ecological networks [1] and epidemic outbreaks on physical contact networks [2] to brain activity on neural networks [3] . To relate the network structure and the process dynamics, there are two approaches of opposing directions. On the one hand, a great body of research [4, 5, 6] focusses on the question: What is the impact of the network structure on the dynamics of a process? For instance, the impact of the network of online social media friendships on the spread of fake news. On the other hand, network reconstruction methods [7, 8, 9, 10] consider the inverse problem: Given some observations of dynamics, what can we infer about the network structure? As an example, one may ask to determine the path of an infectious virus from one individual to another, given observations of the epidemic outbreak.

The prediction of dynamics on an unknown network seems to require the combination of both directions: first, the reconstruction of the network structure based on past observations of the dynamics and, second, the estimation of the future dynamics based on the inferred (i.e. reconstructed) network. Intuitively, one may expect that an accurate prediction of the dynamics is possible only if the network reconstruction is accurate. In this work, paradoxically, we show the contrary: it is possible to accurately predict a general class of dynamics without the network structure.

g (x i (t), x j (t))

Lotka-Volterra (LV)

Cowan-Wilson (CW) −x i (t) (1 + exp (−τ (x j (t) − µ))) −1 

The network is represented by the N × N weighted adjacency matrix A whose elements are denoted by a ij . If there is a directed link from node j to node i, then it holds that a ij > 0, and a ij = 0 otherwise. Throughout this work, we make a clear distinction between the network topology and the interaction strengths [11] . The network topology, or network structure, is the set of all links: all node pairs (i, j) for which a ij > 0. If there is a link from node j to node i, then the interaction strength is specified by the link weight a ij . For instance, consider the two 3 × 3 adjacency matrices For all nodes i, j, it holds that a ij > 0 if and only ifâ ij > 0. Hence, the two matrices A andÂ have the same network topology. However, the interaction strength, e.g., from node 2 to node 1 is different, because a 12 = 1 butâ 12 = 2. We denote the nodal state of node i at time t by x i (t) and the nodal state vector by x(t) = (x 1 (t), ..., x N (t)) T . We consider a general class of dynamical models on networks [12, 7, 13] that describe the evolution of the nodal state x i (t) of any node i as

The function f i (x i (t)) describes the self-dynamics of node i. The sum in (1) represents the interactions of node i with its neighbours. The interaction between two nodes i and j depends on the adjacency matrix A and the interaction function g (x i (t), x j (t)). A broad spectrum of models follows from (1) by specifying the self-dynamics function f i and the interaction function g. We study six particular models of dynamics on networks, which are summarised by Table 1 :

Lotka-Volterra population dynamics (LV) The Lotka-Volterra model [14] describes the population dynamics of competing species. The nodal state x i (t) denotes the population size of species i, the growth parameters of species i equal α i > 0 and θ i > 0, and the link weight a ij quantifies the competition rate, or predation rate, of species j on species i.

We adopt the model of Harush and Barzel [15] to describe mutualistic population dynamics. The nodal state x i (t) denotes the population size of species i, the growth parameters of species i are denoted by α i > 0 and θ i > 0, and the link weight a ij > 0 quantifies the strength of mutualism between species i and species j.

Michaelis-Menten regulatory dynamics (MM) The dynamics of gene regulatory networks can be described by the Michaelis-Menten equation [16, 17, 15] . The nodal state x i (t) is the expression level of gene i, the Hill coefficient is denoted by h, and the link weights a ij > 0 are the reaction rate constants.

Susceptible-Infected-Susceptible epidemics (SIS) Spreading phenomena, such as the epidemic of an infectious disease, can be described by the susceptible-infected-susceptible model [18, 19, 20, 2] . The nodal state x i (t) equals the infection probability of node i. The parameter δ i > 0 denotes the curing rate, and the link weight a ij is the infection rate from node j to node i.

Kuramoto oscillators (KUR) The Kuramoto model [21] has been applied to various synchronisation phenomena of phase oscillators, such as fMRI activity of brain regions [3] . Here, the nodal state x i (t) corresponds to the phase of oscillator i, the parameter ω i denotes the natural frequency of node i, and the coupling strength from node j to node i is given by the link weight a ij .

Cowan-Wilson neural firing (CW) The firing-rates of neurons can be described by the Cowan-Wilson model [22, 13] . Here, the nodal state x i (t) is the activity of neuron i, and the parameters τ and µ are the slope and the threshold of the neural activation function. The link weight a ij specifies the number and strength of synapses from neuron j to neuron i.

As stated in [23] , there are three possibilities for the qualitative long-term behaviour of the dynamical system (1) . First, the nodal state x(t) might approach a steady state x ∞ = lim t→∞ x(t). At the steady state x ∞ , the nodal state does not change any longer, and it holds that dx(t)/dt = 0. Second, the nodal state x(t) might converge to a limit cycle, a curve on which the nodal state x(t) circulates forever. Third, the nodal state x(t) might never come to rest nor enter a repeating cycle. Then, the state x(t) perpetually continues to move in an irregular pattern.

The true adjacency matrix A is unknown. To predict the nodal state x(t), we obtain an estimateÂ of the matrix A from past observations of the nodal state x(t). With the estimated matrixÂ, we can approximate the governing equations (1) for the nodal state x(t). Figure 1 illustrates the framework for predicting network dynamics.

We consider n + 1 nodal state observations x(0), x(∆t), ..., x(n∆t) from the initial time t = 0 until the observation time t = t obs . Here, ∆t > 0 denotes the sampling time with n∆t = t obs . For a (a) Observation 

at every time k = 0, ..., n − 1. A crucial observation is that the derivative dx i (t)/dt in (1) is linear with respect to the entries a ij of the adjacency matrix A. Thus, we obtain from (1) and the discrete-time approximation (2) an approximate linear system as

where the n × 1 vector V i equals

and the n × N matrix F i equals

From (3), we obtain an estimateÂ of the adjacency matrix A by solving min a i1 ,...,a iN

for every node i. The optimisation problem (6) is known as the least absolute shrinkage and selection operator (LASSO) [24] . The application of LASSO, and variations thereof, to network reconstruction is an established approach [25, 7, 8, 26] . The first addend in (6) measures the consistency of the link weights a i1 , ..., a iN with the observations x(0), ..., x(n∆t), given the dynamical model (1) . The second addend favours a sparse solution. The greater the regularisation parameter ρ i > 0, the sparser the reconstructed adjacency matrixÂ. We set the value of the parameter ρ i by hold-out cross-validation [27] . The LASSO (6) can be interpreted as Bayesian estimation problem, provided an exponential prior degree distribution of the adjacency matrix A. For more details on the reconstruction algorithm and the Bayesian interpretation, we refer the reader to Appendix A.

To evaluate the prediction algorithm outlined in Section 3, we consider the dynamics in Table 1 on the respective real-world networks: (LV) Food web of Little Rock Lake [28] , (MP) Mutualistic insect interactions [29, 30] , (MM) Gene regulatory network of the yeast S. Cerevisiae [31] , (SIS) Face-to-face contacts between visitors of the "Infectious: stay away exhibition" [32] , (KUR) Structural connectivity between brain regions [33, 34] , (CW) C. elegans neuronal connectivity [35, 36] . Appendix B specifies the real-world networks and model parameters in detail. Figure 2 shows that the nodal state predictionx(t) is accurate at all times t ≥ t obs , except for the Kuramoto model. The Kuramoto nodal state predictionx(t) is accurate from time t = t obs until t ≈ 2t obs , but then diverges from the true nodal state x(t). In Section 5, we explain what makes the Kuramoto model different.

In view of the high prediction accuracy, we face the fundamental question: How similar are the topologies of the estimated networkÂ and the true network A? We quantify the similarity of the networks A andÂ by two topological metrics. First, we consider the area under the receiver-operatingcharacteristic curve (AUC) [37] . An AUC of 0.5 corresponds to reconstructing the network by tossing a coin for every possible link. The closer the AUC is to 1, the greater the similarity of the reconstructed topology to the true topology. Second, we consider the in-degree distribution of the matrices A andÂ. The in-degree d i of node i equals the number of links that end at node i. The in-degree distribution is given by Pr

where D is the degree of a randomly chosen node in the network. Figure 3 compares the reconstructed networkÂ to the true network A. The AUC value is close to 0.5 for all models. Hence, the topology of the reconstructed network bears practically no resemblance to the true network topology. Moreover, the degree distribution Pr [D ≥ d] of the reconstructed network differs strongly to the degree distribution of the true network, except for Figure 3c and Figure 3f . We emphasise that, even if two networks have the same degree distribution Pr [D ≥ d], the network topologies can be entirely different. For instance, the AUC value equals only 0.53 in Figure 3f . 

The dramatic contrast of accurate prediction of dynamics but inaccurate network reconstruction demands an explanation. The sole input to the prediction algorithm are the observations of the nodal state x(t), which has two implications. First, the nodal state sequence x(0), ..., x(n∆t) does not contain sufficient information to infer the network topology. Second, we do not need the topology to predict the nodal state x(t). But, if not the topology, what else is required to accurately predict dynamics on networks?

Example 1. Consider a small network of N = 3 nodes with the weighted adjacency matrix Suppose that the nodal state vector equals x(t) = (c 1 (t), c 2 (t), c 2 (t)) T at every time t, where c 1 (t) and c 2 (t) denote some scalar functions. In other words, node 2 and node 3 have the same state at every time t. As vector equation, the nodal state x(t) satisfies

For simplicity, we only consider the estimation of the links to node 1, i.e., a 11 , a 12 and a 13 . The evolution of the nodal state x 1 (t) follows from the dynamical model (1) as

However, since x 2 (t) = x 3 (t) = c 2 (t) at every time t, it also holds that

Thus, if we estimated the adjacency matrixÂ withâ 11 = 0,â 12 = 2 andâ 13 = 0, then we could perfectly predict the nodal state x 1 (t). But neither estimateâ 12 norâ 13 is equal to the true link weights a 12 and a 13 , respectively.

For Example 1, the estimateÂ yields a perfect prediction of the dynamics, because 2g(x 1 (t), x 3 (t)) = 2g(x 1 (t), x 2 (t)). More generally, the estimated networkÂ predicts the dynamics (1) exactly if and only if, at every future time t ≥ t obs ,

The network topology of the estimateÂ is relevant for predicting the dynamics only if the topology relates to (8) . We emphasise that (8) is linear with respect to the matrixÂ but not linear with respect to the nodal state x(t), unless the interaction function g is linear.

In Example 1, there exists a matrixÂ = A that satisfies (8), because the 3 × 1 nodal state vector x(t) is equal to the linear combination (7) of only 2 orthogonal vectors y 1 = (1, 0, 0) T and y 2 = (0, 1, 1) T . In general, it is possible to approximate any N × 1 nodal state vector x(t) by

at every time t ∈ [0, T max ]. Here, the agitation modes y 1 , ..., y m are some orthogonal vectors. The approximation (9) is known as proper orthogonal decomposition [38, 39] . The more agitation modes m, the more accurate the approximation (9). If m = N , then the approximation (9) is exact, because any N × 1 vector x(t) can be written as the linear combination of N orthogonal vectors. Intuitively speaking, if the proper orthogonal decomposition (9) is accurate for m << N modes, then the nodal state vector x(t) is barely agitated. In contrast to the zero-one vectors in Example 1, the agitation modes y p are usually more complicated. We obtain the agitation modes y p from the observations of the nodal state dynamics in two steps. First, we define the N × (n + 1) nodal state matrix as

Second, we obtain the agitation modes y 1 , ..., y m as the first m left-singular vectors of the nodal state matrix X. At any time t ≥ 0, the scalar functions c p (t) follow as the inner product (9) is accurate at all times t ∈ [0, T max ]. The number of agitation modes y p equals m = 15, which is considerably lower than the network size N . We emphasise that the agitation modes y p are computed from the nodal state x(t) only until the observation time t obs . Nevertheless, the proper orthogonal decomposition is accurate also at times t ≥ t obs . Hence, during the observation interval [0, t obs ], the nodal state x(t) quickly locks into only few agitation modes y p , which govern the dynamics also for future time t ≥ t obs . For clarity, we stress that the proper orthogonal decomposition (9) cannot be used (directly) to predict the nodal state x(t): Additionally to the agitation modes y p , the coefficients c p (t) at times t ≥ t obs require the future nodal state x(t).

The Kuramoto oscillators are the only dynamics in Figure 2 that do not converge to a steady state x ∞ . Hence, the proper orthogonal decomposition (9) is not accurate when t >> t obs , which explains that the prediction is least accurate for the Kuramoto model.

Why, precisely, is it not possible to reconstruct the network A? The linear system (3) forms the basis for the network reconstruction. The rank of the matrix F i is essential: If the matrix F i is of full rank, i.e., rank(F i ) = N , then there is exactly one solution to (3), namely the entries a i1 , ..., a 1N of the true adjacency matrix A. Otherwise, if rank(F i ) < N , then there are infinitely many solutions to (3) . If there is more than one solution to (3), then the LASSO estimation (6) results in the sparsest solutionÂ (with respect to the 1 -norm).

We compute the numerical 1 rank for Barabási-Albert random graphs versus the network size N .

Here, we consider the best case for the network reconstruction: The derivative dx i (t)/dt is observed exactly, without any approximation error as in (2) . Hence, the system (3) is satisfied with equality. Figure 5 shows that the numerical rank of the matrix F i stagnates as the network size N grows. Hence, the linear system (3) is severely ill-conditioned for large networks. For example, for the SIS process on a network with N = 1000 nodes, we observe a 1000 × 1001 nodal state sequence x(0), ..., x(1000∆t), but the numerical rank does not exceed 32.

The matrix F i , defined by (5), follows from applying the nonlinear function g to the nodal state x(t). The rank of the matrix F i is low, because the nodal state x(t) is barely agitated, see Appendix C. 

This works considers the prediction of general dynamics on networks, based on past observations of the dynamics. We proposed a prediction framework which consists of two steps. First, the network is estimated from the nodal state observations by the LASSO. The first step seemingly fails, since the estimated network bears no topological similarity with the true network. Second, the nodal state is predicted by iterating the dynamical model on the inaccurately estimated network. Counterintuitively, the prediction is accurate! The network reconstruction and prediction accuracy do not match, because the nodal state is barely agitated. Furthermore, the modes of agitation are hardly related to the network topology. Instead of the true topology, the estimated network does capture the interplay with the agitation modes.

We conclude with five points. First, the agitation modes depend on the initial nodal dynamics and, particularly, on the initial nodal state x(0). As a result, the estimated networkÂ depends on the initial state x(0). Thus, as confirmed by numerical simulations, the adjacency matrixÂ may be useless for the prediction of dynamics with a different initial statex(0) = x(0).

Second, the dynamics (1) are autonomous, since there is no control input. In some applications [40, 26] , it may be possible to control the nodal state x(t). Controlling the dynamics would result in more agitation modes of the nodal state x(t). However, physical control constraints, such as power capacities, might limit the number of additional agitation modes. Third, we could observe multiple nodal state sequences x(0), ..., x(n∆) with different initial states x(0) on the same network. For sufficiently many sequences, we would observe enough agitation modes to reconstruct the network exactly, by stacking the respective linear systems (3) . However, Figure 5 shows that the numerical rank of the matrix F i stagnates for large networks. Thus, the greater the network, the more time series must be observed to reconstruct the adjacency matrix A. Observing a sufficiently great number of time series might not be viable, e.g., for the epidemic outbreak of a novel virus like SARS-CoV-2.

Fourth, the proper orthogonal decomposition (7) can be exact. If the network has equitable partitions, then the number of agitation modes equals the number of cells for some dynamical models [41, 42, 43, 44, 45] . Furthermore, the SIS contagion dynamics reduce to only m = 1 agitation mode around the epidemic threshold [46] .

Fifth, the dynamics on two different networksÂ = A is exactly the same only if the networks satisfy (8) . Based on numerical simulations, we showed that (8) holds approximately because the nodal state is barely agitated. We believe that the combination of the proper orthogonal decomposition (9) and equation (8) is a starting point for a further theoretical analysis on relating network structure and dynamics. 

Subsection A.1 states the reconstruction algorithm, which is an adaptation of the method we proposed in [26] for discrete-time epidemic models. The interpretation of the LASSO (6) as a Bayesian estimation problem is given in Subsection A.2.

The solutionâ i1 (ρ i ), ...,â iN (ρ i ) to the LASSO (6) depends on the regularisation parameter ρ i . We aim to choose the parameter ρ i that results in the solutionâ i1 (ρ i ), ...,â iN (ρ i ) with the greatest prediction accuracy. To assess the prediction accuracy, we apply hold-out cross-validation [27] :

We divide the nodal state observations into a training set x(0), ..., x(n train ∆t) and a validation set x((n train + 1)∆t), ..., x(n∆t). The training set is used to obtain the solutionâ i1 (ρ i ), ...,â iN (ρ i ) in dependency of ρ i , whose prediction accuracy is evaluated on the validation set. We choose the regularisation parameter ρ i with the greatest prediction accuracy on the validation set. More precisely, we define the training set as the first 80% of the nodal state observations x(0), x(∆t), ..., x(n train ∆t), where n train = 0.8n . We denote the n train × 1 training vector V train,i as

and the n train × N training matrix F train,i as

We denote the solution of the LASSO (6) byâ i1 (ρ i ), ...,â iN (ρ i ), when F i and V i are replaced by F train,i and V train,i , respectively. If an entryâ ij (ρ i ) of the LASSO solution is smaller than the threshold 0.01, then we round off and setâ ij (ρ i ) = 0. The prediction error MSE(ρ i ) of ρ i on the validation set is defined as

Here, the (n − n train ) × 1 validation vector V valid,i and the (n − n train ) × N validation matrix F valid,i are defined by the nodal state observations x((n train + 1)∆t), ..., x(n∆t), analogously to the training vector V train,i and the training matrix F train,i We iterate over a set Θ i , specified below, of predefined candidate values for ρ i . Every candidate value ρ i ∈ Θ i results in a different prediction error MSE(ρ i ). We determine the final regularisation parameter ρ opt,i as the candidate value ρ opt,i ∈ Θ i with the minimal prediction error MSE(ρ opt,i ). We obtain the final estimateâ i1 , ...,â iN as the solution to the LASSO (6) with the regularisation parameter ρ opt,i , using the matrix F i and vector V i from all nodal state observations x(0), ..., x(n∆t).

We define the set Θ i as 20 logarithmically equidistant candidate values as Θ i = {ρ min,i , ..., ρ max,i }. [47] the solution to the LASSO (6) equals a ij = 0 for all nodes j. Thus, we set the candidate values in the set Θ i proportional to ρ th,i . We define ρ min,i = 2 F T i V i ∞ 10 −6 and ρ max,i = 2 F T i V i ∞ 10 −2 . The network reconstruction method is given by Algorithm 1.

For every node i, we define the error w i (k∆t) of the first-order approximation (2) of the derivative dx i (t)/dt at time t = k∆t, such that

The approximation (11) can be regarded as a nonlinear system in discrete time k with random model errors w i (k∆t), which forms the basis for the Bayesian interpretation of the LASSO (6) . Furthermore, we rely on two assumptions.

Assumption 1. For every node i at every time k, the approximation error w i (k∆t) follows the normal distribution N 0, σ 2 w with zero mean and variance σ 2 w . Furthermore, the approximation errors w i (k∆t) are stochastically independent and identically distributed at all times k = 1, ..., n and for all nodes i. ρ opt,i ← argmin

(â i1 , ...,â iN ) ← the solution to (6) for ρ i = ρ opt,i on the whole data set F i , V i 14:â ij ← 0 for allâ i1 , ...,â iN smaller than 0.01 15: end for

The exact model error w i (k∆t) is difficult to analyse, since w i (k∆t) is determined by higher-order derivatives of the nodal state x(t). In contrast, assuming that the model error w i (k∆t) follows a Gaussian distribution N 0, σ 2 w allows for a simple analysis. Furthermore, Assumption 1 stems from the maximum entropy principle [48] : Given a set of constraints on a probability distribution (e.g., specified mean), assume the "least informative" distribution, i.e., the distribution with maximum entropy that satisfies those constraint. Among all distributions on R with zero mean and variance σ 2 w , the Gaussian distribution N 0, σ 2 w has the maximum entropy [49] .

Assumption 2. The adjacency matrix A with non-negative elements a ij ≥ 0 follows the prior distribution

where the normalisation constant α is set such that

Furthermore, the matrix A and the initial nodal state x(0) are stochastically independent.

Clearly, there are more suitable random graph models for real-world networks than the exponential degree distribution in Assumption 2. In particular, the degree distribution of many real-world networks follows a power-law [50] . However, the central result in this work is given by the juxtaposition of Figure 2 with Figure 3 : It is not necessary to accurately reconstruct the degree distribution to predict the dynamics on a network. Thus, even with the potentially imprecise assumption on the degree distribution (12) , it is possible to accurately predict the dynamics on the network. Proposition 2 states the Bayesian interpretation of the LASSO (6). We emphasise that Proposition 2 is not novel and follows standard arguments in parameter estimation, see for instance [51] . Furthermore, Tibshirani elaborated on the Bayesian interpretation of the LASSO in the seminal paper [52] . Nevertheless, we believe that the presentation of Proposition 2, here in the context of network reconstruction, is valuable to the reader.

Proposition 2. Suppose that Assumption 1 and Assumption 2 hold true and that the nodal state x(t) follows (11) . Then, provided the regularisation parameter equals ρ i = 2σ 2 w /∆t 2 , the matrixÂ, which is obtained by solving the LASSO (6) for every node i, coincides with the Bayesian estimate:

Proof. Analogous steps to the derivations in [53] yield that (13) is equivalent tô

log Pr x((k + 1)∆t) x(k∆t), A .

Under Assumption 1, the errors w i (k∆t) are independent for different nodes i. Thus, we obtain that

The probability Pr x i ((k + 1)∆t) x(k∆t), A is determined by the distribution of the error w i (k∆t).

From (1) and (11), it follows that

With the definition of the vector V i and the matrix F i in (4) and (5), respectively, we obtain that

Hence, under Assumption 2 on the prior Pr [A], the optimisation problem (14) becomeŝ

The term log (α) is constant with respect to the matrix A and can be omitted. Furthermore, the optimisation can be carried out independently for every node i, which yields that max a i1 ,...,a iN n k=1 log   Pr

Under Assumption 1, the errors w i (k∆t) follow a Gaussian distribution, which results in the minimisation problem min a i1 ,...,a iN n k=1 log(

a ij ≥ 0 j = 1, ..., N.

Omitting the constant term log( √ 2πσ w ) and multiplying with 2σ 2 w /∆t 2 gives

By identifying ρ i = 2σ 2 w /∆t 2 , we obtain the LASSO (6), which completes the proof.

Here, we provide details on the empirical networks and the parameters for the respective network dynamics in Section 2. For every network topology, we obtain the link weights a ij as follows. If there is a link from node j to node i, then we set the element a ij to a uniformly distributed random number in [0.5, 1.5]. If there is no link from node j to node i, then we set the respective element to a ij = 0.

For the competitive population dynamics described by the Lotka-Volterra equations, we consider the Little Rock Lake network [28] , which we accessed via the Konect network collection [54] . The asymmetric and connected network consists of N = 183 nodes, which correspond to different species. There are L = 2494 directed links which specify the predation of one species upon another. For every species i, we set the growth parameters α i and θ i to uniformly distributed random numbers in [0.5, 1.5]. Furthermore, we set the the initial nodal state x i (0) to a uniformly distributed random number in [0, 1] for every species i. We set the maximum prediction time to T max = 5.

Kato et al. [29] studied the relationship between 679 insect species and 91 plants in a beech forest in Kyoto by specifying which insects pollinate or disperse which plant. We accessed the insect-plant network via the supplementary data in [30] . The insect-plant network determines a mutualistic insectinsect network [15] : If two insect species i and j pollinate or disperse the same plant, then both insect species i and j contribute to, and benefit from, the abundance of the plant. Thus, if two insect species i and j are linked to the same plant, then we set a ij to a uniformly distributed random number in [0.5, 1.5], and a ij = 0 otherwise. As a result, we obtain a symmetric and disconnected network with N = 679 nodes and L = 30, 905 links.

For every species i, we set the growth parameters α i and θ i to a uniformly distributed random number in [0.5, 1.5]. Furthermore, we set the the initial nodal state x i (0) to a uniformly distributed random number in [0, 20] for every species i. We set the maximum prediction time to T max = 0.025.

We consider the transcription interactions between regulatory genes in the yeast S. Cerevisiae [31] . The asymmetric and disconnected network has N = 620 and L = 869 links. The influence from gene j to gene i is in either an activation or inhibition regulation. Since the activator interactions account for more than 80% of the links between genes, we only consider activation interactions, see also [12] . In line with Harush and Barzel [15] , we consider degree avert regulatory dynamics by setting the Hill coefficient to h = 2. We set the the initial nodal state x i (0) to a uniformly distributed random number in [0, 2] for every node i. We set the maximum prediction time to T max = 3.

The SIS contagion dynamics are evaluated on the contact network of the Infectious: Stay Away exhibition [32] between N = 410 individuals, accessed via [54] . The connected and symmetric network has L = 5530 links. A link between two nodes i,j indicates that the respective two individuals had a face-to-face contact that lasted for at least 20 seconds.

A crucial quantity for the SIS dynamics is the basic reproduction number R 0 , which is defined as [55]

Here, the spectral radius of an N × N matrix M is denoted by ρ(M ), and diag (δ 1 , ..., δ N ) denotes the N ×N diagonal matrix with the curing rates δ 1 , ..., δ N on its diagonal. If the basic reproduction number R 0 is less than or equal to 1, then the epidemic dies out [19] , i.e., x(t) → 0 as t → ∞. We would like to study the spread of a virus that does not die out, and we aim to set the basic reproduction number to R 0 = 1.5: First, we set the "initial curing rate" δ (0) i to a uniformly distributed random number in [0.5, 1.5] for every node i. Then, we set the curing rates to δ i = cδ (0) i , where the multiplicity c is chosen such that the basic reproduction number in (15) equals R 0 = 1.5. We set the the initial nodal state x i (0) to a uniformly distributed random number in [0, 0.1] for every node i. Furthermore, we set the maximum prediction time to T max = 0.5.

We consider Kuramoto oscillator dynamics on the structural human brain network [56] of size N = 78. Every node corresponds to a brain region of the automated anatomical labelling (AAL) atlas [57]. The structural brain network specifies the anatomical connectivity between regions, i.e., the physical connections between regions based on white matter tracts. White matter tracts were estimated using fibre tracking from diffusion MRI data from the Human Connectome Project [33] as outlined in [34] . The network is symmetric and has L = 696 links.

For every node i, we set the natural frequency ω i to a normally distributed random number with zero mean and standard deviation 0.1π. Furthermore, we set the the initial nodal state x i (0) to a uniformly distributed random number in [−π/4, π/4] for every node i. We set the maximum prediction time to T max = 1.

We consider the modified Cowan-Wilson neural firing model of Laurence et al. [13] on the neuronal connectivity of the adult Caenorhabditis elegans hermaphrodite worm. Originally, White et al. [35] compiled the neuronal connectivity of C. elegans. In [36, 58] , the neural wiring was updated, which we accessed online via the Wormatlas online database 2 . The somatic nervous system has N = 282 neurons and L = 2994 synapses. A link from node j to node i indicates the presence of at least one synapse from neuron j to neuron i.

The slope and the threshold of the neural activation functions are set to τ = 1 and µ = 1, respectively. The initial state x i (0) of every node i is set to a uniformly distributed random number in [0, 10]. We set the maximum prediction time to T max = 4.

The indices i and j refer to the first and second argument of the function g(x i , x j ). Thus, the coefficient η (α, β) does not depend on the value of the indices i, j. With (18) , it follows from (17) that

With (19), we obtain the Taylor series of the function r i in (16) as

where we denote the element-wise power of the vector x as

We define the truncation of the series (20) until the q-th power as

For a sufficiently great power q, the matrix F i is approximated by F i ≈ F q,i , where we define the matrix F q,i with the truncations r q,i (x) as In fact, if the interaction function g(x i , x j ) is a polynomial of degree q, then the matrices F i and F q,i coincide, i.e., F i = F q,i . For instance, it holds that F i = F 2,i for the SIS epidemic process whose interaction function equals g(x i , x j ) = (1 − x i )x j . The low agitation of the nodal state vector x in (9) indeed explains the ill-conditioning of the matrix F i : Proposition 3. Suppose that the N × 1 nodal state vector x(t) equals the linear combination of m vectors y p at every time t,

for some scalars c p (t) ∈ R. Then, the rank of the matrix F q,i is bounded by

Proof. It follows from (22) that

x(t) ∈ span (y 1 , ..., y m ) at every time t, where span (y 1 , ..., y m ) denotes the span of the vectors y 1 , ..., y m . We rewrite the function r q,i (x) in (21) as

Both terms η (k − β, β) and x k−β i are scalars. Thus, we obtain that

for some scalars µ 0 (x), µ 1 (x), ..., µ q (x) ∈ R. Since the rows of the matrix F q,i are given by (23) for some x ∈ span (y 1 , ..., y m ), it holds that rank (F q,i ) ≤ q β=0 dim x β x ∈ span (y 1 , ..., y m ) .

We consider the addends in (24) separately. For any vector x ∈ span (y 1 , ..., y m ), it holds that

for some scalars c 1 , ..., c m . The multinomial theorem yields that

x β j = p 1 +p 2 +...+pm=β β! p 1 !p 2 ! · · · p m ! m l=1 c l (y l ) j p l .

We define the coefficients ζ (p 1 , ..., p m ) = β! p 1 !p 2 ! · · · p m ! m l=1 (c l ) p l , which gives that 

Here, we defined the vectors ν (p 1 , ..., p m ) = (y 1 ) p 1 (y 2 ) p 2 ... (y m ) pm , where denotes the Hadamard product, or element-wise product. From (26) , it follows that the vector x β is a linear combination of all vectors ν (p 1 , ..., p m ) with p 1 + p 2 + ... + p m = β, which yields that dim x β x ∈ span (y 1 , ..., y m ) = β + m − 1 m − 1 .

We complete the proof by combining (24) and (27).

As an example, consider that the nodal state x(t) of the SIS epidemic process is agitated in only m = 10 agitation modes y p . Then, since F i = F 2,i for the SIS epidemic process, Proposition 3 states that rank (F i ) = rank (F 2,i ) ≤ 2 β=0 β + 10 − 1 10 − 1 , which yields that rank (F i ) ≤ 66.

Stability and complexity in model ecosystems

Epidemic processes in complex networks

Exploring the network dynamics underlying brain activity during rest

Complex networks: Structure and dynamics

Dynamical processes on complex networks

Dynamical systems on networks

Revealing networks from dynamics: an introduction

Data based identification and prediction of nonlinear and complex dynamical systems

Network structure from rich but noisy data

Network reconstruction and community detection from dynamics

The architecture of complex weighted networks

Universality in network dynamics

Spectral dimension reduction of complex dynamical networks

Species packing and competitive equilibrium for many species

Dynamic patterns of information flow in complex networks

An introduction to systems biology: design principles of biological circuits

Universal resilience patterns in complex networks

The Mathematical Theory of Infectious Diseases and Its Applications

A deterministic model for gonorrhea in a nonhomogeneous population

Virus spread in networks

Chemical oscillations, waves, and turbulence. Courier Corporation

Excitatory and inhibitory interactions in localized populations of model neurons

Exploring complex networks

Statistical learning with sparsity: the lasso and generalizations

Reconstructing propagation networks with natural diversity and identifying hidden sources

Network reconstruction and prediction of epidemic outbreaks for general group-based compartmental epidemic models

On the use of cross-validation for time series predictor evaluation

Artifacts or attributes? Effects of resolution on the Little Rock Lake food web

Insect-flower relationship in the primary beech forest of Ashu, Kyoto: an overview of the flowering phenology and the seasonal pattern of insect visits

Non-random coextinctions in phylogenetically structured mutualistic networks

Network motifs: simple building blocks of complex networks

What's in a crowd? Analysis of face-to-face behavioral networks

The WU-Minn Human Connectome Project: an overview

How do spatially distinct frequency specific MEG networks emerge from one underlying structural connectome? The role of the structural eigenmodes

The structure of the nervous system of the nematode Caenorhabditis elegans

Wiring optimization can relate neuronal structure and function

An introduction to ROC analysis

Approximation of large-scale dynamical systems

The method of proper orthogonal decomposition for dynamical characterization and order reduction of mechanical systems: an overview

Revealing network connectivity from response dynamics

Graph spectra for complex networks

Observability and coarse graining of consensus dynamics through the external equitable partition

Epidemic outbreaks in networks with equitable or almost-equitable partitions

Graph partitions and cluster synchronization in networks of oscillators

Non-linear network dynamics with consensus-dissensus bifurcation

Time-dependent solution of the NIMFA equations around the epidemic threshold

An interior-point method for largescale l 1 -regularized least squares

Information theory and statistical mechanics

Probability, random variables, and stochastic processes. Tata McGraw-Hill Education

Parameter estimation and inverse problems

Regression shrinkage and selection via the lasso

Exact network reconstruction from complete SIS nodal state infection information seems infeasible

We are grateful to Prejaas Tewarie for providing data on the structural brain network.

We argue that if the proper orthogonal decomposition (9) is accurate, then the matrix F i in (5) is ill-conditioned. We rewrite the matrix F i aswhere the rows are given by N × 1 vectorsWe aim to show that the approximation of the nodal state x(t) in (9) implies that the row vectors r i (x(k∆t)), where k = 0, 1, ..., n − 1, can be approximated by the linear combination of only few vectors, which implies the ill-conditioning of the matrix F i . To shorten the notation, we drop the time index t in this section. More precisely, we formally replace the nodal state x(t) by x and the functions g (x i (t), x j (t)) and r i (x(t)) by g (x i , x j ) and r i (x), respectively. To analyse the nonlinear dependency of the rows r i (x) on the nodal state x, we resort to a Taylor expansion of the rows r i (x). The function r i : R N → R N is specified by the nonlinear interaction function g of the dynamical model (1) . The Taylor expansion of the function g(x i , x j ) around the point x i = x j = 0 readsWe define the coefficient η (α, β) as