key: cord-0022127-hvkswn23
authors: Gong, Xue; Higham, Desmond J.; Zygalakis, Konstantinos
title: Directed network Laplacians and random graph models
date: 2021-10-13
journal: nan
DOI: 10.1098/rsos.211144
sha: 1bcaced0be68a820602d601e9c46bf075e32f2f5
doc_id: 22127
cord_uid: hvkswn23

We consider spectral methods that uncover hidden structures in directed networks. We establish and exploit connections between node reordering via (a) minimizing an objective function and (b) maximizing the likelihood of a random graph model. We focus on two existing spectral approaches that build and analyse Laplacian-style matrices via the minimization of frustration and trophic incoherence. These algorithms aim to reveal directed periodic and linear hierarchies, respectively. We show that reordering nodes using the two algorithms, or mapping them onto a specified lattice, is associated with new classes of directed random graph models. Using this random graph setting, we are able to compare the two algorithms on a given network and quantify which structure is more likely to be present. We illustrate the approach on synthetic and real networks, and discuss practical implementation issues.

financial contagion [8] . Similarly, periodic hierarchies have been associated with sustainability and risk management issues in commerce [9] , and also with the existence of echo chambers in online social media [10] .

Of course, on real data, these structures may not be so pronounced; hence in addition to visualizing the reordered network, we are interested in quantifying the relative strength of each type of signal. Laplacian-based methods are often motivated from the viewpoint of optimizing an objective function. This work focuses on two such methods. Minimizing frustration leads to the magnetic Laplacian which may be used to reveal periodic hierarchy [5, 11] . Minimizing trophic incoherence leads to what we call the trophic Laplacian, which may be used to reveal linear hierarchy [6] . We will exploit the idea of associating a spectral method with a generative random graph model. This in turn allows us to compare the outputs from spectral methods based on the likelihood of the associated random graph. This connection was proposed in [12] to show that the standard spectral method for undirected networks is equivalent to maximum-likelihood optimization assuming a class of range-dependent random graphs (RDRGs) introduced in [13] . The idea was further pursued in [14] , where a likelihood ratio test was developed to determine whether a network with RDRG structure is more linear or periodic.

The main contributions of this work are as follows.

-We propose two new directed random graphs models. One model has the unusual property that the probability of an i → j connection is not independent of the probability of the reciprocated j → i connection. -We establish connections between these random graph models and algorithms from [6, 11] that use the magnetic Laplacian and trophic Laplacian, respectively, by showing that reordering nodes or mapping them onto a specific lattice structure using these algorithms is equivalent to maximizing the likelihood that the network is generated by the models proposed. -We show that by calibrating a given network to both models, it is possible to quantify the relative presence of periodic and linear hierarchical structures using a likelihood ratio. -We illustrate the approach on synthetic and real networks.

The rest of the paper is organized as follows. In the next section, we introduce the magnetic and trophic Laplacian algorithms. Section 3 defines the new classes of random directed graphs and establishes their connection to these spectral methods. Illustrative numerical results on synthetic networks are given in §4, and in §5, we show results on real networks from a range of applications areas. We finish with a brief discussion in §6.

We consider an unweighted directed graph G = (V, E) with node set V and edge set E, with no selfloops. The adjacency matrix A is n × n with A ij = 1 if the edge i → j is in E, and A ij = 0 otherwise. It is royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144 convenient to define the symmetrized adjacency matrix W (s) = (A + A T )/2. The symmetrized degree matrix D is diagonal with

is the average of the in-degree and out-degree of node i. Later, we will consider weighted networks for which each edge i → j has associated with it a non-negative weight w ij . In this case, we let A ij = w ij . We use i to denote ffiffiffiffiffiffi ffi À1 p , and we write x H to denote the conjugate transpose of a vector x [ C n . We use P to denote the set of all permutation vectors, that is, all vectors in R n with distinct components given by the integers 1, 2, …, n.

Spectral methods explore properties of graphs through the eigenvalues and eigenvectors of associated matrices [1, 2, 15, 16] . In the undirected case, the standard graph Laplacian L = D − A is widely used for clustering and reordering, along with normalized variants. The directed case has received less attention; however, several extensions of the standard Laplacian have been proposed [7] . We focus on two spectral methods for directed networks, which are discussed in the next two subsections: the magnetic Laplacian algorithm, which reveals periodic flow structures [5, 11] , and the trophic Laplacian algorithm, which reveals linear hierarchical structures [6] . We choose to study these two algorithms because they have an optimization formulation and, as we show in §3, may be interpreted in terms of random graph models. Here, we briefly mention two other related techniques that do not fit naturally into this framework. The Hermitian matrix method groups nodes into clusters with a strong imbalance of flow between clusters [4] . This approach constructs a skew-symmetric matrix that emphasizes net flow between pairs of nodes but ignores reciprocal edges. A spectral clustering algorithm motivated by random walks was derived in [17] leading to a graph Laplacian for directed networks that was proposed earlier in [18] .

Given a network and a vector of angles θ = (θ 1 , θ 2 , …, θ n ) T in [0, 2π), we may define the corresponding frustration

where δ ij = −2πgα ij with g [ ½0, 1 2 . Here, α ij = 0 if the edge between i and j is reciprocated, that is A ij = A ji = 1; α ij = 1 if the edge i → j is unreciprocated, that is A ij = 1 and A ji = 0; and α ij = −1 if the edge j → i is unreciprocated, that is A ij = 0 and A ji = 1. For convenience, we also set α ij = 0 if i and j are not connected. To understand the definition (2.1), suppose that for a given graph we wish to choose angles that produce low frustration. Each term W ðsÞ ij je iui À e idij e iu j j 2 in (2.1) can make a positive contribution to the frustration if W ðsÞ ij = 0; that is, if i and j are involved in at least one edge. In this case, if there is an edge from i to j that is not reciprocated, then we can force this term to be zero by choosing θ j = θ i + 2πg. If the edge is reciprocated, then we can force the term to be zero by choosing θ j = θ i . Hence, intuitively, choosing angles to minimize the frustration can be viewed as mapping the nodes into directed clusters on the unit circle in such a way that (a) nodes in the same cluster tend to have reciprocated connections and (b) unreciprocated edges tend to point from source nodes in one cluster to target nodes in the next cluster, periodically. Setting the parameter g = 1/k for some positive integer k indicates that we are looking for k directed clusters.

On a real network, it is unlikely that the frustration (2.1) can be reduced to zero, but it is of interest to find a set of angles that give a minimum value. This minimization problem is closely related to the angular synchronization problem [19, 20] , which estimates angles from noisy measurements of their phase differences u i À u j mod 2p. Moreover, we note that for visualization purposes it makes sense to reorder the rows and columns of the adjacency matrix based on the set of angles that minimizes the frustration. We also note that in [11] the expression in (2.1) for the frustration is normalized through a division by 2 P i d i . This is immaterial for our purposes, since that denominator is independent of the choice of θ.

The frustration (2.1) is connected to the magnetic Laplacian, which is defined as follows, where A B denotes the elementwise, or Hadamard, product between matrices of the same dimension; that is,

royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144 Definition 2.1. Given g [ ½0, 1 2 , the magnetic Laplacian L (g) [5, 11] is defined as

Here, the transporter matrix T (g) assigns a rotation to each edge according to its direction.

It is straightforward to show that L (g) is a Hermitian matrix. When g = 0 and the graph is undirected, the magnetic Laplacian reduces to the standard graph Laplacian.

The following result, which is implicit in [5, 11] , shows that the frustration (2.1) may be written as a quadratic form involving the magnetic Laplacian.

ij j e iu i À e id ij e iu j j 2 : ð2:2Þ

Appealing to the Rayleigh-Ritz theorem [21] the quadratic form on the left-hand side of (2.2) is minimized over all c [ C n with kck 2 ¼ 1 by taking c to be an eigenvector corresponding to the smallest eigenvalue of the magnetic Laplacian. Now, such an eigenvector will not generally be proportional to a vector with components of the form fe iu j g n j¼1 . However, a useful heuristic is to force this relationship in a componentwise sense; that is, to assign to each θ j the phase angle of ψ j , effectively solving a relaxed version of the desired minimization problem. This leads to algorithm 1, as used in [11] .

The idea of discovering a linear directed hierarchy arises in many contexts where edges represent dominance or approval, including the ranking of sports teams [22] and Web pages [23] . A particularly well-defined case is the quantification of trophic levels in food webs, where each directed edge represents a consumer-resource relationship [24] [25] [26] . We focus here on the approach in [6] , where the aim is to assign a trophic level h i to each node i such that along any directed edge the trophic level increases by one. This motivates the minimization of the trophic incoherence

Denoting the total weight of node i as v i ¼ P j[V ðA ji þ A ij Þ and the imbalance as x i ¼ P j[V ðA ji À A ij Þ, the trophic level vector h [ R n that minimizes the trophic incoherence solves the linear system of equations

where L ¼ diagðvÞ À A À A T , and the solution to (2.4) is unique up to a constant shift [6] . Since it employs a Laplacian-style matrix, L, we refer to it as the trophic Laplacian algorithm; see algorithm 2.

Result: Phase angles of nodes u Input adjacency matrix A; 

In this section, we associate two new random graph models with the magnetic and trophic Laplacian algorithms, using a similar approach to the work in [12] . After establishing these connections, we proceed as in [14] and propose a maximum-likelihood test to compare the two models on a given network.

Given a set of phase angles fu i g n i¼1 , we will define a model for unweighted, directed random graphs. The model generates connections between each pair of distinct nodes i and j with four possible outcomes-a pair of reciprocated edges, an unreciprocated edge from i to j, an unreciprocated edge from j to i, or no edges-as follows:

and

where f, q and l are functions that define the model, and, of course, they must be chosen such that all probabilities lie between zero and one. We emphasize that this model has a feature that distinguishes it from typical random graph models, including directed Erdos-Rényi and small-world style versions [27] : the probability of the edge i → j is not independent of the probability the edge j → i, in general. We are interested here in the inverse problem where we are given a graph and a model (3.1)-(3.4), and we wish to infer the phase angles. This task arises naturally when the nodes are supplied in some arbitrary order. We will assume that the phase angles are to be assigned values from a discrete set fn i g n i¼1 ; that is, we must set u i ¼ n p i , where p is a permutation vector. This setting includes the cases of (directed) clustering and reordering. For example, with n = 12, we could specify ν 1 = ν 2 = ν 3 = 0, ν 4 = ν 5 = ν 6 = π/2, ν 7 = ν 8 = ν 9 = π, and ν 10 = ν 11 = ν 12 = 3π/2, in order to assign the nodes to four directed clusters of equal size. Alternatively, ν i = (i − 1)2π/12 would assign the nodes to equally spaced phase angles, as shown in figure 2a, as a means to reorder the graph. The following theorem shows that solving this type of inverse problem for suitable f, q and l is equivalent to minimizing the frustration.

where p is a permutation vector. Then minimizing the frustration η(θ) in (2.1) over all such θ is equivalent to maximizing the likelihood that the graph came from a model of the form (3.1)- (3.4) in the case where

and lðu i ,

for any positive constant γ.

Algorithm 2. Trophic Laplacian algorithm.

Calculate the node weights

Reorder or visualize nodes using h royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144

Proof. We first note that, since

ii ¼ 0, we may express η(θ) (equation (2.1)) in terms of a sum over ordered pairs:

Then, distinguishing between the three different ways in which each i and j may be connected, we have

The likelihood L of the graph G from a model of the form (3.1)-(3.4) is given by

which we may rewrite as

The final factor on the right-hand side, which is the probability of the null graph, takes the same value for any u [ R n such that u i ¼ n pi , since each ordered pair of arguments appears exactly once. We may therefore ignore this factor when maximizing the likelihood. Then, taking the logarithm and negating, we see that maximizing the likelihood is equivalent to minimizing the expression royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144

Comparing terms in (3.8)-(3.10) and (3.6)-(3.7), we see that the two minimization problems are equivalent if

and

where we may choose any positive constant γ since the minimization problems are scale invariant. Solving for f, q and l as functions of θ i and θ j we arrive at the model in the statement of the theorem. ▪

For the model in theorem 3.1, the probability of an edge from node i to node j depends on the phase difference β ij = θ i − θ j , the decay rate γ, and the parameter g. We see that γ determines how rapidly the edge probability varies with the phase difference. In the extreme case when γ = 0, we obtain f(θ i , θ j ) = q(θ i , θ j ) = l(θ i , θ j ) = 1/4, and thus the model reduces to a conditional Erdos-Rényi form. In addition, as γ increases the graph generally becomes more sparse. This is because the likelihood of disconnection, exp [2γ(1 − cos(θ i − θ j ))]/Z ij , is greater than or equal to that of the other cases.

We note that having applied the magnetic Laplacian algorithm to estimate θ, there are two straightforward approaches to estimating γ. One way is to maximize the graph likelihood over γ > 0. Another is to choose γ so that the expected edge density from the random graph model matches the edge density of the given network. We illustrate these approaches in §4.

Remark 3.2. Since the edge probabilities are functions of the phase differences and have a periodicity of 2π, this model resembles the periodic range-dependent random graph ( pRDRG) model in [14] , which generates an undirected edge between i and j with probability f (min{|j − i|, n − |j − i|}) for a given decay function f. We will therefore use the term directed periodic range-dependent random graph model (directed pRDRG) to describe the model in theorem 3.1.

Now, given a set of trophic levels fh i g n i¼1 , we define an unweighted, directed random graph model where

for some function f. Here, the probability of an edge i → j is independent of the probability of the edge j → i. Following our treatment of the directed pRDRG case, we are now interested in the inverse problem where we are given a graph and the model (3.11)-(3.12), and we wish to infer the trophic levels. We will assume that the trophic levels are to be assigned values from a discrete set fn i g n i¼1 ; that is, we must set h i ¼ n p i , where p is a permutation vector. This setting includes the cases of assignment of nodes to trophic levels of specified size; for example, with n = 12, we could set ν 1 = ν 2 = ν 3 = 1, ν 4 = ν 5 = ν 6 = 2, ν 7 = ν 8 = ν 9 = 3 and ν 10 = ν 11 = ν 12 = 4, in order to assign the nodes to four equal levels. Alternatively, ν i = i would assign each node to its own level, which is equivalent to reordering the nodes. The following theorem shows that solving this type of inverse problem for suitable f is equivalent to minimizing the trophic incoherence. likelihood that the graph came from a model of the form (3.11)- (3.12) in the case where

for any positive γ.

Proof. Noting that the denominator in (2.3) is independent of the choice of h, this result is a special case of theorem 3.5 below, with

For the model in theorem 3.3, the probability of an edge i → j is a function of the shifted, directed, squared difference in levels, (h j − h i − 1) 2 . The larger this value, the lower the probability. Within the same level, where h i = h j , the probability is 1=ð1 þ e g Þ. The edge probability takes its maximum value of 1/2 when h j − h i = 1, that is, when the edge starts at one level and finishes at the next highest level. We also see that the overall expected edge density is always smaller than 1/2. Across different levels, where h i ≠ h j , the edge i → j and the edge j → i are not generated with the same probability. If |h j − h i − 1| < |h i − h j − 1|, the edge i → j is more likely than j → i. The two edge probabilities are equal if and only if h i = h j . Therefore, this model could be interpreted as a combination of an Erdos-Rényi model within the same level and a periodic range-dependent model across different levels.

The parameter γ controls the decay rate of the likelihood as the shifted, directed, squared difference in levels increases. When h j − h i = 1, γ plays no role. If γ = 0, the model reduces to Erdos-Rényi with an edge probability of 1/2. As γ → ∞, the edge probability tends to zero if h j − h i ≠ 1. In this case, the model will generate a multipartite graph where edges are only possible in one direction between adjacent levels, and this happens with probability 1/2. As mentioned previously in §3.1 and illustrated in §4, γ can be fitted from a maximum likelihood estimate or by matching the edge density.

We note that the definition of trophic incoherence in (2.3) and the resulting trophic Laplacian algorithm make sense for a non-negatively weighted graph, in which case we have the following result. Here, to be concrete we assume that weights lie strictly between zero and one. Similar results can be obtained for weights from a discrete distribution. for any positive γ, where Z ij ¼ ð1 À e Àgðh j ÀhiÀ1Þ 2 Þ=ðgðh j À h i À 1Þ 2 Þ is a normalization factor.

Proof. This is a special case of theorem 3.6 below, where

The results in § §3.1 and 3.2 exploit the form of the objective function: the sum over all edges of a kernel function can be viewed as the sum of log-likelihoods. This shows that the minimization problem is equivalent to maximizing the likelihood of an associated random graph model, in the setting where we assign nodes to a discrete set of scalar values. The restriction to discrete values is used in the proofs to make the probability of the null graph constant. However, we emphasize that in practice the relaxed versions of the optimization problems, which are solved by the two algorithms, do not have this restriction. The magnetic Laplacian algorithm produces real-valued phase angles and the trophic Laplacian algorithm produces real-valued trophic levels.

We may extend the connection in theorem 3.3 to the case of higher dimensional node attributes, that is, where we wish to associate each node with a discrete vector from a set fn ½k g n k¼1 , where each n ½k [ R d for some d ≥ 1. This setting arises, for example, if we wish to visualize the network in higher dimension; a natural extension of the ring structure would be to place nodes at regularly spaced points on the surface of the unit sphere, see figure 2b, which we produced with the algorithm in [28] . The next result generalizes theorem 3.3 to this case.

Theorem 3.5. Suppose we have an unweighted directed graph with adjacency matrix A and a kernel function over all such fh ½k g n k¼1 is equivalent to maximizing the likelihood that the graph came from a model where the (independent) probability of the edge i → j is

for any positive γ.

Proof. Given fh ½k g n k¼1 , the probability of generating a graph G from the model stated in the theorem is

The second factor on the right-hand side, the probability of the null graph, does not depend on the choice of fh ½k g n k¼1 . So we may ignore this factor, and after taking logs and negating we arrive at the equivalent problem of minimizing

Comparing (3.16) and (3.14), we see that two minimization problems have the same solution when

for any positive γ, and the result follows. ▪ For the model in theorem 3.5, given fh ½k g n k¼1 the edge i → j appears according to a Bernoulli distribution with probability f ðh ½i , h ½j Þ, and hence with variance f ðh ½i , h ½j Þ½1 À f ðh ½i , h ½j Þ ¼ e gIðh ½i ,h ½j Þ ½1 þ e gIðh ½i ,h ½j Þ 2 :

When Iðh ½i , h ½j Þ ¼ 0 the probability is 1/2 and the variance takes its largest value, 1/4. The edge probability is symmetric about i and j if and only if the function I is symmetric about its arguments. In the case of squared Euclidean distance, Iðh ½i , h ½j Þ ¼ kh ½i À h ½j k 2 , and an undirected graph, the relaxed version of the minimization problem is solved by taking d eigenvectors corresponding to the smallest eigenvalues of the standard graph Laplacian.

For completeness, we now state and prove a weighted analogue of theorem 3.5 assuming that weights lie strictly between zero and one. Discrete-valued weights may be dealt with similarly. Theorem 3.6. Suppose fh ½k g n k¼1 may take values from the given set fn ½k g n k¼1 ; that is, h ½k ¼ n ½p k [ R d , where p is a permutation vector. Then, given a weighted graph with weights in (0, 1), minimizing the expression (3.14) over all such fh ½k g n k¼1 is equivalent to maximizing the likelihood that the graph came from a model where A ij has (independent) density Proof. It is straightforward to check that the normalization factor Z ij ensures ð 1 y¼0 f ij ðyÞ dy ¼ 1:

Now the product over all pairs ∏ i,j Z ij is independent of the choice of permutation vector p. Hence, under the model defined in the theorem, maximizing the likelihood of the graph G is equivalent to maximizing ∏ i,j f ij (A ij ). After taking logarithms and negating, we see that the choice (3.17) allows us to match (3.14). ▪ Remark 3.7. It is natural to ask whether the frustration (2.1) fits into the form (3.14), and hence has an associated random graph model of the form (3.15). We see from (3.5) that the frustration may be written

However, the factor je iu i À e id ij e iu j j 2 depends (through δ ij ) on A ij , and hence we do not have an expression of the form (3.14). This explains why a new type of model, with conditional dependence between the i → j and j → i connections, was needed for theorem 3.1.

The random graph models appearing in §3 capture the characteristics of linear and periodic directed hierarchies. Hence it may be of interest (a) to analyse properties of these models and (b) to use these models to evaluate the performance of computational algorithms. However, in the remainder of this work we focus on a follow-on topic of more direct practical significance. The magnetic Laplacian and trophic Laplacian algorithms allow us to compute node attributes θ and h in R n for a given graph, leading to unsupervised node ordering. The main computation required in this step is finding dominant eigenvector-eigenvalue pairs. Assuming that the network is sparse (each node has an O(1) degree) and that the power method gives the required accuracy in a finite number of iterations, this is an OðnÞ computation. Motivated by theorems 3.1 and 3.3, we may then compute the likelihood of the graph for this choice of attributes, which has a complexity of Oðn 2 Þ. By comparing likelihoods, we may quantify which underlying structure is best supported by the data. An extra consideration is that both random graph models involve a free parameter, γ > 0, which is needed to evaluate the likelihood. As discussed earlier, one option is to fit γ to the data, for example by matching the expected edge density from the model with the edge density of the given graph. However, based on our computational tests, we found that a more reliable approach was to choose the γ that maximizes the likelihood, once the node attributes were available; see § §4 and 5 for examples. Our overall proposed workflow for model comparison is summarized in algorithm 3.

In this section, we demonstrate the model comparison workflow on synthetic networks. These networks are generated using the directed pRDRG model and the trophic RDRG model. Hence, we have a 'ground truth' concerning whether a network is more linear or periodic. Note that the magnetic Laplacian algorithm and associated random graph model have a parameter g that controls the spacing between Algorithm 3. Model comparison.

for Candidate spectral methods do Compute node attributes (in our case with magnetic and trophic Laplacian algorithms);

Derive the associated random graph model; Calculate maximum likelihood over g . 0; end Report or compare maximum likelihoods royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144

clusters. Therefore, when using the magnetic Laplacian algorithm our first step is to select the parameter g based on the maximum likelihood of the graph.

We generate a synthetic network using the directed pRDRG model with K clusters of size m, and hence n = m K nodes. An array of angles u [ R n is created, forming evenly spaced clusters C 1 , C 2 , …, C K . This is achieved by letting θ i = (2π(l − 1)/K ) + σ if i ∈ C l , where σ ∼ unif(−a, a) is added noise. We then construct the adjacency matrix according to the probabilities in theorem 3.1 with g = 1/K. We choose m = 100, K = 5, γ = 5 and a = 0.2 and the corresponding adjacency matrix is shown in figure 3a .

The magnetic Laplacian algorithm is then applied to the adjacency matrix to estimate phase angles and reorder the nodes. The reordered adjacency matrix (figure 3b) recovers the original structure. The trophic Laplacian algorithm is also applied to estimate the trophic level of each node. Figure 3c shows the adjacency matrix reordered by the estimated trophic levels, which hides the original pattern. Intuitively, the trophic Laplacian algorithm is unable to distinguish between these nodes since there is no clear 'lowest' or 'highest' level among the directed clusters. Figure 3d illustrates how the optimal parameter g is selected. The plots show the likelihood that the network is generated by a directed pRDRG model for g ¼ 1 2 , 1 3 , 1 4 , 1 5 , 1 6 , assuming we are interested in structures with at most 6 directed clusters. We see that g ¼ 1 5 has the highest maximum likelihood, as expected. Consequently, we choose g = 1/5 for the magnetic Laplacian algorithm. In addition for this value of g, we plot in figure 3e the phase angles estimated with the magnetic Laplacian algorithm against the true phase angles. The linear relationship confirms that the algorithm recovers the five clusters in the presence of noise.

We finally in figure 3f compare the likelihood of a directed pRDRG against the likelihood of a trophic RDRG. Both likelihoods are calculated using several test points for γ. The highest points are highlighted with circles and they correspond to the maximum-likelihood estimators (MLE) for γ. Not surprisingly, in this case, the magnetic Laplacian algorithm achieves a higher maximum. Asterisks highlight the point estimates arising when the expected number of edges is matched to the actual number of edges. We see here, and also observed in similar experiments, that the maximum-likelihood estimate for γ produces a more accurate result. We also found (numerical experiments not presented here) that the accuracy of both types of γ estimates improves as n increases when using the magnetic Laplacian algorithm.

Following on from the previous subsection, we now generate synthetic data by simulating the trophic RDRG model with levels C 1 , C 2 , …, C K , where each level has m nodes. In particular, we generate an array of trophic indices h [ R n , where the total number of nodes is n = m K. We let royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144 , a) is added noise. The edges are then generated according to the probabilities in theorem 3.3. In the following example, we use K = 5, m = 100, a = 0.2 and γ = 5. This generates a network with five clusters forming a linear directed flow, as shown in figure 4a . We see in figure 4c that the trophic Laplacian algorithm recovers the underlying pattern. Figure 4b shows that the magnetic Laplacian algorithm also gives adjacent locations to nodes in the same cluster, and places the clusters in order, modulo a 'wrap-around' effect that arises due to its periodic nature. Figure 4d suggests that the optimal magnetic Laplacian parameter is g = 1/6. For this case, it is reasonable that g = 1/K is not identified, since the disconnection between the first and the last cluster contradicts the structure of the directed pRDRG model.

The trophic levels estimated using the trophic Laplacian are consistent with the true trophic levels, as shown by the linear pattern in figure 4e. As expected, the trophic Laplacian produces a higher maximum likelihood for this network (figure 4f ) and a more accurate MLE and point estimate for γ. We observe (in similar experiments not presented here) that when using the trophic Laplacian, the accuracy of both estimates increases using the trophic Laplacian.

We now discuss practical use cases for the model comparison tool on a range of real networks. We emphasize that the tool is not designed to discover whether a given directed network has linear or directed hierarchical structure; rather it aims to quantify which of the two structures is best supported by the data in a relative sense. Since both models under investigation assume no self-loops, we discard these if they are present in the data. Following common practice, we also preprocess by retaining the largest strongly connected component to emphasize directed cycles. This ensures that any pair of nodes can be connected through a sequence of directed edges. However, when the strongly connected component contains too few nodes, we analyse the largest weakly connected component instead.

We give details on four networks, covering examples of the two cases where linear and periodic structure dominates. For the first two networks, we show network visualizations to illustrate the results further. In §5.5, we present summary results over 15 networks.

In the Florida Bay food web 1 [29] , nodes are components of the system, and unweighted directed edges represent carbon transfer from the source nodes to the target nodes [30] , which usually means that the latter feed on the former. Besides organisms, the nodes also contain non-living components, such as carbon dissolved in the water column. Since we are more interested in the relationship between organisms, we remove those non-living components from the network. We analyse the largest strongly connected component of the network, which comprises 12 nodes and 28 edges.

We estimate the phase angles of each node using the magnetic Laplacian algorithm based on the optimal choice g = 1/3 (figure 5a). Figure 5b compares the likelihood of the food web being generated by the directed pRDRG model with the likelihood of it being generated by the trophic RDRG model, as γ varies. The directed pRDRG model achieves a higher maximum likelihood, suggesting that the structure is more periodic than linear. In figure 5c , the heights of the nodes correspond to their estimated trophic levels on a vertical axis. We see that 22 edges point upwards, these are shown in blue. There are six downward edges, highlighted in red, which violate the trophic structure. The magnetic Laplacian mapping in figure 5d arranges 26 edges in a counterclockwise direction, shown in blue, with 2 edges, shown in red, violating the structure and pointing in the reverse orientation.

With g = 1/3, the magnetic Laplacian mapping is encouraging cycles in the food chain, and these are visible in figure 5d , notably between members of three categories: (i) flatfish and other demersal fishes; (ii) lizardfish and eels; and (iii) toadfish and brotalus. Another noticeable distinction is that the magnetic Laplacian mapping positions eels close to lizardfish, and flatfish near other demersal fishes by accounting for the reciprocal edges, while the trophic Laplacian mapping places them further apart. In figure 5e ,f, we show the reordered adjacency matrix arising from the two algorithms. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144 13 

The influence matrix we study quantifies the influence of selected system factors in the Motueka Catchment of New Zealand [31] . The original influence matrix consists of integer scores between 0 and 5, measuring to what extent the row factors influence the column factors, where a bigger value represents a stronger impact. The system factors and influence scores were developed by pooling the views of local residents. To convert to an unweighted network, we binarize the weights by keeping only the edges between each factor and the factor(s) it influences most strongly. We then select the largest strongly connected component, which comprises 14 nodes and 35 edges. The optimal parameter for the magnetic Laplacian is g = 1/4 (figure 6a). The mapping from the magnetic Laplacian has a higher maximum likelihood than the trophic Laplacian mapping, indicating a more periodic structure (figure 6b). The trophic Laplacian mapping in figure 6c aims to reveal a hierarchical influence structure. Here, scientific research and economic inputs are assigned lower trophic levels, suggesting that they are the fundamental influencers. The labour market is placed at the top, indicating that it tends to be influenced by other factors. However, there are eight edges, highlighted in red, that point downwards, violating the directed linear structure.

On the other hand, the magnetic Laplacian mapping in figure 6d aims to reveal four directed clusters with phase angles of approximately 0, π/2, π, 3π/2. We highlight the nodes corresponding to ecological factors in red and socio-economic factors in blue. The cluster near π/2 with 6 nodes contains a combination of ecological and socio-economic factors, and includes 6 reciprocal edges between Overall, the pattern agrees with the conceptual schematic model proposed in [31, fig. 5a ], which we have reproduced in figure 7 . This model posits that ecological factors exert influence on socio-economic factors, which in turn influence ecological factors, while the ecological system also influences itself.

We now analyse a gene transcriptional regulation network 2 [29] for a type of yeast called S. cerevisiae [32] , where a node represents an operon made up of a group of genes in mRNA. An edge from operon i to j indicates that the transcriptional factor encoded by j regulates i. The original network is directed and signed, with signs indicating activation and deactivation. Here, we ignore the signs and only consider the connectivity pattern. Since the largest strongly connected component has very few nodes, we take the largest weakly connected component, which comprises 664 nodes and 1078 edges. This is a very sparse network and consequently the log-likelihood of the directed pRDRG (figure 8a) keeps increasing as a function of the decay rate parameter γ in the range we tested. We select g = 1/3 as the optimal parameter for the magnetic Laplacian, and compare the log-likelihood of two models in figure 8b . This time the trophic version achieves a higher maximum likelihood, favouring a linear structure.

Caenorhabditis elegans is the only organism whose neural network has been fully mapped. The neural network of C. elegans 3 [29] is unweighted and directed, representing connections between neurons and fig. 5a ]. synapses [33] . We investigate its largest strongly connected component with 109 nodes and 637 edges. The optimal value for the parameter g among the test points is g = 1/5 (figure 9a). The trophic Laplacian algorithm achieves a higher maximum likelihood than the magnetic Laplacian algorithm using figure 9b. This preference for a linear directed structure is consistent with the tube-like shape of the organism [34] .

A summary of further real-world network comparisons is given in table 1. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144

In the dataset column, we use (s) and (w) to indicate whether the largest strongly or weakly connected component is analysed, respectively. The fourth column specifies the optimal parameter g for the magnetic Laplacian determined through grid search among the test points g = 1/2, 1/3, 1/4, 1/5, 1/6. The decay parameter γ used for the grid search ranges from 0 to 20 with a step size of 0.5. The last column shows the logarithm of the ratio between the maximum likelihoods of the directed pRDRG and trophic models. Hence, periodic/linear structure is seen to be favoured for the networks in the first 8 rows/last 7 rows.

Spectral methods can be used to extract structures from directed networks, allowing us to detect clusters, rank nodes and visualize patterns. This work exploited a natural connection between spectral methods for directed networks and generative random graph models. We showed that the magnetic Laplacian and tropic Laplacian can each be associated with a range-dependent random graph. In the magnetic Laplacian case, the new random graph model has the interesting property that the probabilities of i → j and j → i connections are not independent. Our theoretical analysis provided a workflow for quantifying the relative strength of periodic versus linear directed hierarchy, using a likelihood ratio, adding value to the standard approach of visualizing a new graph layout or reordering the adjacency matrix.

We demonstrated the model comparison workflow on synthetic networks, and also showed examples where real networks were categorized as more linear or periodic. The results illustrate the potential for the approach to reveal interesting patterns in networks from ecology, biology, social sciences and other related fields.

There are several promising directions for related future work. It would be of interest to use the likelihood ratios to compare this network feature across a well-defined category in order to address questions such as 'are results between top chess players more or less periodic than results between top tennis players?' and 'does an organism that is more advanced in an evolutionary sense have more periodic connectivity in the brain?' An extension of the comparison tool to weighted networks should also be possible; here there are notable, and perhaps application-specific, issues about how to generalize and interpret the magnetic Laplacian. Also, the comparison could be extended to include other types of structure, including stochastic block and core-periphery versions [39] . This introduces further challenges of (a) accounting for different numbers of model parameters and (b) dealing with nonlinear spectral methods. Furthermore, by introducing an appropriate null model it may be possible to quantify the presence of linear or periodic hierarchies in absolute, rather than relative, terms.

Data accessibility. This research made use of public domain data that are available from the Internet, as indicated in the text. Code for the experiments is available at https://github.com/OpalGX/Directed-Network-Laplacians.

A tutorial on spectral clustering

Linear algebra and learning from data

Higherorder organization of complex networks

Hermitian matrices for clustering directed graphs: insights and applications

Magnetic eigenmaps for community detection in directed networks

2020 How directed is a directed network?

Clustering and community detection in directed networks: a survey

Trophic incoherence drives systemic risk in financial exposure networks

Measuring circular supply chain risk: a Bayesian royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 211144 network methodology

Echo chambers in climate science

Magnetic eigenmaps for the visualization of directed networks

Unravelling small world networks

Range-dependent random graphs and their application to modeling large small-world proteome datasets

Periodic reordering

Spectral graph theory

Spectral clustering and its use in bioinformatics

2020 Spectral clustering for directed networks

Laplacians and the Cheeger inequality for directed graphs

An extension of the angular synchronization problem to the heterogeneous setting

Handbook of matrices

Digraphs are different: why directionality matters in complex systems

Several measures of trophic structure applicable to complex food webs

Graph hierarchy: a novel approach to understanding hierarchical structures in complex networks

Navigation in a small world

How to generate equidistributed points on the surface of a sphere

SNAP datasets: Stanford Large Network Dataset Collection

Network analysis of trophic dynamics in South Florida ecosystems

The influence matrix methodology: a technical report

Network motifs: simple building blocks of complex networks

Nonoptimal component placement, but short processing paths, due to long-distance projections in neural systems

The nematode Caenorhabditis elegans

Clustering by passing messages between data points

The political blogosphere and the 2004 US election: divided they blog

Rationing social contact during the COVID-19 pandemic: transmission risk and social benefits of US locations

Finding community structure in networks using the eigenvectors of matrices

A nonlinear spectral method for core-periphery detection in networks

Acknowledgements. The authors thank Colin Singleton from the CountingLab for suggesting the Dunnhumby data used in table 1 and providing advice on data analysis.