key: cord-0459820-aswpxb9r authors: Veedu, Mishfad Shaikh; Salapaka, Murti V. title: Topology Identification under Spatially Correlated Noise date: 2020-12-08 journal: nan DOI: nan sha: a1519567a88d5b3f74ff47e9d4dec13016639c89 doc_id: 459820 cord_uid: aswpxb9r This article addresses the problem of reconstructing the topology of a network of agents interacting via linear dynamics, while being excited by exogenous stochastic sources that are possibly correlated across the agents, from time-series measurements alone. It is shown, under the assumption that the correlations are affine in nature, such network of nodal interactions is equivalent to a network with added agents, represented by nodes that are latent, where no corresponding time-series measurements are available; however, here all exogenous excitements are spatially (that is, across agents) uncorrelated. Generalizing affine correlations, it is shown that, under polynomial correlations, the latent nodes in the expanded networks can be excited by clusters of noise sources, where the clusters are uncorrelated with each other. The clusters can be replaced with a single noise source if the latent nodes are allowed to have non-linear interactions. Finally, using the sparse plus low-rank matrix decomposition of the imaginary part of the inverse power spectral density matrix (IPSDM) of the time-series data, the topology of the network is reconstructed. Under non conservative assumptions, the correlation graph is retrieved. Networks and graphical models provide convenient tools for simple representation of complex high dimensional multi-agent systems. This simplified representation is useful in applications such as power grids [31, 45] , meteorology [18] , neuroscience [3] , and finance [4] . Knowledge of the network interaction structure (also known as network topology) helps in understanding, predicting, and in many applications, controlling/identifying the system behavior [15, 23, 27, 33, 37, 43] . In applications such as power grid, finance, and meteorological system, it is either difficult or impossible to intervene and affect the system. Hence, inferring the network properties by passive means such as time-series measurements is of great interest in these applications. An example is unveiling the correlation structure between the stocks in the stock market from daily share prices [4] , which is useful in predicting the market behavior. Learning the conditional independence relation between variables from time-series measurements has been an active research field among machine learning (ML), prob- abilistic graphical model (PGM), and statistics communities [5, 7, 28, 29, 32, 36] , where the system modules were considered as random variables. However, such studies fail to capture dynamic dependencies between the entities in a system, which are prevalent in most of the aforementioned applications. For dynamical systems, autoregressive (AR) models that are excited by exogenous Gaussian noise sources, which are independent across time and variables, were explored in [1, 2, 38] . Here, the graph topology captured the sparsity pattern of the regressor coefficient matrices, which characterized the conditional independence between the variables. [2, 38] showed that the sparsity pattern of the inverse power spectral density matrix (IPSDM), also known as concentration matrix, identifies the conditional independence relation between the variables. As shown in [26] , the conditional independence structure between the variables is equivalent to the moral-graph structure of the underlying graph. For multi-agent systems with linear dynamical interaction, excited by wide sense stationary (WSS) noise sources that are mutually uncorrelated across the agents, moral graph reconstruction using Wiener projection gained popularity in the last decade [20, 24, 26] . Here, the idea is to reconstruct the moral-graph of the underlying linear dynamic influence model (LDIM) from the magnitude response of Wiener coefficients, obtained from time-series measurements. For a wide range of ap-plications, the spurious connections in the moral-graph can be identified-returning the true topology-by observing the phase response of the Wiener coefficients [41] . Furthermore, for systems with strictly causal dynamical dependencies, Granger causality based algorithms can unveil the exact cause-effect nature of the interactions; also known as the exact parent-child relationship of the underlying graph [13, 14, 26] . In many networks, time-series at only a subset of nodes is available. The nodes where the time-series measurements are not available are known as latent/confound/hidden nodes. Topology reconstruction is more challenging in the presence of latent nodes as additional spurious connections due to confounded effects are formed when applying the aforementioned techniques here. AR model identification of the independent Gaussian time-series discussed above, in the presence of latent nodes, was studied in [8, 9, 10, 12, 16, 21, 46, 47] . Here, the primary goal was to eliminate the spurious connections due to latent nodes and retrieve the original conditional independence structure from the observed time-series. In applications such as power grid, there is a need to retrieve the complete topology of the network, including that of the latent nodes. Such problems were studied for bidirected tree [40] , poly-forest [34] , and poly-tree [25, 35] networks excited by WSS noise sources that are uncorrelated across the agents. Recently, an approach to reconstruct the complete topology of a general linear dynamical network with WSS noise was provided in [44] . The work in [44] can be considered as a generalization of AR model identification with latent nodes, in asymptotic time-series regime. However, one major caveat of the aforementioned literature on graphical models is that the results fail if the exogenous noise sources are spatially correlated, i.e. if the noise is correlated across the agents/variables. Indeed, system identification literature have studied the systems with spatially correlated noise sources [17, 27, 33] , but, under the assumption that the exact topology of the network is available. Therefore, this article aims to bridge the gap in the literature by studying the topology identification of the LDIMs that allow spatially correlated noise sources. To the best of our knowledge, this is the first work in network topology identification literature that considers a general spatially correlated noise sources. The first major contribution of this article is to provide a transformation that converts an LDIM without latent nodes, but excited by spatially correlated exogenous noise sources, into LDIMs with latent nodes. Here, the latent nodes are characterized by the maximal cliques in the correlation graph, the undirected graph that represents the spatial correlation structure. It is shown that, under affine correlation assumption, that is, when the correlated noise sources are related in affine way (Assumption 1), there exist transformed LDIMs with latent nodes, where all the nodes are excited by spatially uncorrelated noise sources. A key feature of the transformation is that the correlations are completely captured using the latent nodes, while the original topology remains unaltered. The original moral-graph/topology is the same as the moral-graph/topology among the observed nodes in the altered graph. Thus, this transformed problem is shown to be equivalent to topology identification of networks with latent nodes discussed before. Consequently, any of the aforementioned techniques for the networks with latent node can be applied on the transformed LDIM to reconstruct the original moral graph/topology. Next, relaxing the affine correlation assumption, polynomial correlation is considered, where the correlated noise sources are correlated via a polynomial relation (Assumption 2); here, the focus is on noise sources with distributions that are symmetric around the mean. It is shown that, under this scenario, the transformed dynamical model can be excited by clusters of noise sources, where the clusters are uncorrelated. However, the noise sources inside the clusters can be correlated. Then, using the sparse+low-rank decomposition of IPSDM from [44] , the true topology of the network along with the correlation structure is reconstructed, from the IPSDM of the original LDIM, without any additional information, if the network satisfies a necessary and sufficient condition. Notice that, the results discussed here are applicable to the networks with static random variables and AR models under spatially correlated exogenous noise sources also. The article is organized as follows. Section 2 introduces the system model, including LDIM, and the essential definitions. Section 3 discusses IPSDM based topology reconstruction. Section 4 describes the correlation graph to latent nodes transformation and some major results. Section 5 discusses transformation of LDIMs with polynomial correlation among the correlated nodes to LDIMs with latent nodes. Section 6 explains the sparse+lowrank decomposition technique and how the topology can be reconstructed without the knowledge of correlation graph. Simulation results are provided in Section 7. Finally, Section 8 concludes the article. , z ∈ C, |z| = 1 denote power spectral density matrix, where (Φ x ) ij is cross power spectral density between the time-series at nodes i and j; F denotes the set of real rational single input single output transfer functions that are analytic on unit circle; A denotes the set of rationally related zero mean jointly wide sense stationary (JWSS) scalar stochastic process; Z(.) denotes bilateral z-transform; S n denotes the space of all skew symmetric n × n matrices; N denotes the set of natural numbers, {0, 1, 2, . . . } ; For a set S, |S| denotes the cardinality of the set. Consider a network of n interconnected nodes, where node i is equipped with time-series measurements, ( x i (t)) t∈Z , i ∈ [n]. The network interaction is described by, where The problem we address is described as follows. Problem 1 (P1) Consider a well-posed and topologically detectable LDIM, (H, e), where Φ e is allowed to be non-diagonal and its associated graphs G(V, E) and G c (V, E c ) are unknown. Given the power spectral density matrix Φ x , where x is given by (1), reconstruct the topology of G. In this section, the IPSDM based topology reconstruction is presented. In [26] , the authors showed that, for any LDIM characterized by (1), the availability of IPSDM, which can be written as is sufficient for reconstructing the moral-graph of (H, e). That is, if (Φ −1 x ) ij = 0, then i and j are kins. However, an important assumption for the result to hold true is that Φ −1 e is diagonal. If Φ −1 e is non-diagonal, then the result does not hold in general. For i = j, and any of the four terms can cause (Φ −1 x ) ij = 0, depending on Φ −1 e . Hence, this technique cannot be applied directly to solve Problem 1. For example, consider the network G given in Fig. 1a 13 = 0, which implies that the estimated topology has (2, 3) present, while (2, 3) / ∈ kin(G). In the next section, the correlation graph, G c (V, E c ), is scrutinized and properties of G c (V, E c ) are evaluated as the first step in unveiling the true topology when noise sources admit correlations. In this section, a transformation of the LDIM, (H, e), to an LDIM with latent nodes by exploiting the structural properties of the noise correlation is obtained. The transformation converts an LDIM without latent nodes and driven by spatially correlated exogenous noise sources to an LDIM with latent nodes that are excited by spatially uncorrelated exogenous noise sources. Assuming perfect knowledge of the noise correlation structure, the latent nodes and their children in the transformed LDIM are characterized. It is shown that although the transformation is not unique, the topology of the transformation is unique under affine correlation (Assumption 1). For e = [ e o ; e h ] and H := ( H, e) (or with a slight abuse of notation, ([H, F], e)) denotes an LDIM with L latent nodes, where F ik denotes the directed edge weight from latent node k to observed node i and H ik denotes the directed edge weight from observed node k to observed node i. Notice that the latent nodes considered in this article are strict parents and so they do not have incoming edges. The reason will become clear to the reader from Section 4.1 below. Here, it is demonstrated that the spatially correlated exogenous noise sources can be viewed as the children of a latent node that is a common parent of the correlated sources. The idea is explained in the motivating example below. Towards this, let us define the following notion of equivalent networks. (1) ) and (H (2) , e (2) ) be two LDIMs and let x (1) = (I n −H (1) ) −1 e (1) and x (2) = (I n − H (2) ) −1 e (2) respectively. Then, the LDIMs (H (1) , e (1) ) and (H (2) , e (2) ) are said to be equivalent, denoted Consider the network of three nodes in Fig. 1a . Suppose the noise sources e 1 and e 2 are correlated, that is, Φ e1e2 = 0. Now, consider the network in Fig. 1b , where e 1 , e 2 , e 4 , and e 3 are jointly uncorrelated and node 4 is a latent node. Note that e 3 is same in both the LDIMs while e 1 , e 2 are different from e 1 , e 2 . From the LDIM of Fig. 1a , one has From the LDIM of Fig. 1b , one obtains Comparing (3) and (4), it is evident that if e 1 , e 2 , and e 4 are such that e 1 = e 1 + h 14 e 4 and e 2 = e 2 + h 24 e 4 , then the time-series obtained from both the LDIMs are identical. That is, the LDIMs are equivalent. A natural question then is to ask what the equivalent LDIMs (in the sense of Definition 1) would look like if there are more than a pair of correlated noise sources. For a given LDIM, (H, e), with correlated exogenous noise sources, one can define a space of equivalent LDIMs with uncorrelated noise sources that provide the same time-series. The following definition captures this space of "LDIMs with latent nodes and uncorrelated noise sources" that are equivalent to the original LDIM with correlated noise sources. Assumption 1 Two random processes X, Y ∈ A are correlated only via affine interactions, i.e., Φ XY = 0 if and only if there exists an affine transform, . for some non-affine f , then the corresponding characterizing transformation, L (defined below for affine correlations), would render the transformed LDIM non-linear. This is addressed in Section 5. Definition 2 For any LDIM, (H, e), with Φ e nondiagonal, and satisfying Assumption 1, is the space of all L-transformations for a given (H, e). Based on the definition of L(H, e) in (5), LDIM for ([H, F], e) can be rewritten as where e(z) = [ e 1 (z), . . . , e n (z), . . . , e n+L (z)] are mutually uncorrelated and G = I n , F . An LDIM, ( H, e) ∈ L(H, e), is called a transformed LDIM of (H, e) in this article. Intuitively, Ltransformation completely assigns the spatial correlation component present in the original noise source, e, to latent nodes, without altering the LDIG, G(V, E), similar to the discussion on the aforementioned motivating example. For the LDIM in Fig. 2a with correlation graph in Fig. 2b, Fig. 2c and Fig. 2d show some examples of the LDIGs that belong to L(H, e). Thus, L-transformation returns a larger network with latent nodes. Notice that the latent nodes present in an LDIM ( H, e) ∈ L(H, e), are strict parents, whose interaction with the true nodes are characterized by F. The following theorem formalizes the relation between the correlation graph and the latent nodes in the transformed higher dimensional LDIMs. In particular, it shows that two noise sources e i and e j are correlated if and only if there is a latent node, which is a common parent of both the nodes i and j in the transformed LDIM. Lemma 2 Let (H, e) be an LDIM that satisfies Assumption 1, and let G c (V, E c ) be the correlation graph of and only if for any LDIM ( H, e) ∈ L(H, e), the corresponding LDIG contains a latent node h such that h ∈ P a(i) ∩ P a(j). Thus, e i and e j in the original LDIM are correlated if and only if for every L-transformed LDIM there exists a k such that F ik = 0 and F jk = 0. The following theorem shows that a subgraph, where In particular, for any latent node h in the LDIG of ( H, e), Ch(h) forms a clique in G c (V, E c ). Proof: Refer Appendix B. 2 Remark 2 In (7), depending on the LDIM, k l can take more than one values in N \ {0}. In other words, for any given maximal clique (with the set of nodes in the clique, V ) in G c and for any k ≥ 1, there exists a transformed LDIM ( H, e) ∈ L(H, e) with k number of latent nodes such that the set V is equal to the children of the latent nodes. For |V | = 3, Fig. 2 shows examples with k l = 1, 3. The following lemma, which follows directly from Theorem 3, shows that the number of latent nodes, L, present in any ( H, e) ∈ L(H, e) is at least Q c , the number of maximal cliques with clique size greater than one in G c . Proof: This follows from (7), where k ≥ 1, ∀ ∈ {1, . . . , Q c }. Then, for every maximal clique G (V , E ) ⊆ G c , there exists a unique latent node in the LDIG of ( H, e) such that for any i, j ∈ V , ∈ P a(i) ∩ P a(j). Proof: Refer Appendix C. 2 Next, transformation of correlations into latent variable is described using L(H, e, Q c ). Here, the LDIMs in L(H, e, Q c ) from the previous section is studied more carefully. The following is an explicit construction of the transformation, obtained by applying Lemma 5. Fig. 4a-4e demonstrates this construction. For any (H, e) , let the correlation graph be E l ) , where G l c (V l , E l ) denotes clique l and Q c denotes the number of cliques with clique size 1. For example, Fig. 4c shows a correlation graph with Q c = 3 and Q c = 4. Next, construct an n + l denote the latent node that captures the correlations in G l c (see Fig. 4e ). For any (k, i) ∈ E l 2 , the edge weight is given by F ik . Notice that E l 2 is obtained by removing the edges (i, j) ∈ E l and replacing them with two directed edges (n + l, i) and (n + l, j). Furthermore, n + l does not have a parent. Let the nodes i ∈ [n + Q c ] be excited by e i ∈ A that is uncorrelated with e j ∈ A for j ∈ [n + Q c ], i = j. Notice that the LDIG G t is characterized by ( H, e) In topology reconstruction, only the support structure of the transfer functions matter. The following proposition shows that the support structure is unique. The following theorem shows that the problems are equivalent. Theorem 9 The problems (P1) and (P2) are equivalent. That is, the topology reconstructed in both the problems are the same. Proof: From proposition 7, we have that for any transformed network ( H, e) ∈ L(H, e, Q c ), the topology is consistent. Moreover, T| n×n = I {H =0} . Therefore, topology among the observed nodes of ( H, e) is equal to T (V, E). The theorem follows. 2 Therefore, instead of reconstructing the topology of (H, e) with Φ e non-diagonal, it is sufficient to reconstruct the topology among the observed nodes for one of the LDIMs in L(H, e, Q c ). The results in the previous section assumed that the correlations are affine in nature. In this section, a generalization of the affine correlation is addressed. It is shown that the noise sources that are correlated via non-affine, but a polynomial, interaction can be characterized using latent nodes with non-linear interaction dynamics. By lifting the processes to a higher dimension, the nonlinearity is converted into linear interaction. The following definitions are useful for the presentation in this section. Suppose e 1 = |α|≤M a α,1 v α and e 2 = |α|≤M a α,2 v α , where a α,1 , a α,2 ∈ F and v ∈ A m , m, M ∈ N, with Φ v diagonal. Let y = P(v, p) be the vector obtained by concatenating v α , |α| ≤ p. Then, e 1 = A 1 y and e 2 = A 2 y. Then, Φ e1e2 = A 1 Φ y A * 2 , where A i is the vector obtained by concatenating a α,i . Therefore, understanding Φ e1e2 requires one to know what Φ y is. Notice that y is lifting of v into a higher dimensional space of polynomials. In the following, a discussion on the structure of Φ y is provided, with the help of examples. To obtain insights into lifting of noise processes to a higher dimension, here lifting of independent and identically distributed (IID) Gaussian process (GP) is studied. It is shown that, under lifting, the PSDM is block diagonal. where p!! := (p − 1)(p − 3) . . . 3.1 denotes the double factorial. To get an insight in to the problem, consider m = 2 and let y = [y 1 , . . . , y 10 ] : Notice that k = 2, 7, 9 corresponds to the terms with odd power on x 1 and even power on x 2 . Repeating the same for every E {y i y k }, 1 ≤ i, k ≤ 10, one can show that, after appropriate rearrangement of rows and columns, the covariance matrix and the PSDM of y forms a block diagonal matrix, given by (9) . Here,ỹ = [y 1 , y 4 , y 6 , y 2 , y 7 , y 9 , y 3 , y 8 , y 10 , y 5 ]. It is worth mentioning that since the Gaussian process is zero mean IID, the auto-correlation function where δ is Kronecker delta and thus Φỹ(z) = Rỹ(0), ∀|z| = 1 is white (same for all the frequencies). The following proposition shows this for general m, p. Then, there exists aỹ, a permutation of y, such that Φỹ is block diagonal with 2 m non-zero blocks. Proof: Refer Appendix E. 2 Based on this example, one can extend the result to symmetric zero mean WSS processes. Symmetric distributions are the distributions with probability density functions symmetric around zero, for example, zero mean Gaussian distributions. Definition 6 A probability distribution p(·) is a symmetric distribution (around mean) if p(µ−x) = p(µ+x), where µ is the mean. Proposition 11 Consider a WSS process with symmetric distribution (Definition 6), v ∈ A m and Φ v diagonal. Let y = P(v, p). Then, there exists aỹ, a permutation of y such that Φỹ is block diagonal with 2 m non-zero blocks. Proof: Proof is similar to the IID GP case. For symmetric distributions, odd moments are zero, since odd moments are the integral of odd functions. 2 As shown in the proof of Proposition 10, the monomial nodes corresponding to a given block diagonal (i.e., the same odd-even pattern) can be grouped into a cluster. Fig. 3c shows such a clustering with m = 2 and p = 3. The red nodes are the lifted processes in the higher dimension (the monomial nodes y 1 , . . . , y 10 ). The nodes inside a given cluster (shown by blue ellipse) are correlated with each other. The nodes inside a cluster are correlated with each other and the nodes from different clusters are not correlated with each other; this is Here, e1 = e1 + F12y2 + F17y7 + F19y9, e2 = e2 +F22y2 +F27y7 +F29y9 +F21y1 +F24y4 + F26y6, and e3 = e3 + F31y1 + F34y4 + F36y6. an artifact of Φỹ being block-diagonal. Here, e 1 and e 2 are correlated via the cluster of y 2 , y 7 , and y 9 , whereas e 2 and e 3 are correlated via the cluster of y 1 , y 4 , and y 6 . Notice the following important distinction from the LDIMs in Fig. 1 . Here, it is sufficient for the observed nodes to be connected to one of the latent nodes in a given cluster. It is not necessary for the nodes to have a common parent like the case in Fig. 1 . The common par-ent property from Section 4 is replaced with a common ancestral cluster here. Based on the aforementioned discussion, relaxation of Assumption 1 and extension of the LDIM transformation results from Section 4 is presented here. It is shown that in order to relax Assumption 1, one might need non-linear interaction between the latent nodes and the observed nodes. The following assumption is a generalization of Assumption 1. Assumption 2 Two random processes X, Y are correlated if and only if there exist polynomials P 1 , P 2 and Remark 3 To be precise, the extension of Assumption 1 is given by y = P (x), where x ∈ A and P (x) = α a α x α , a α ∈ F is a polynomial. Any x ∈ A can be written as x = e x + bv, where e x , v ∈ A, b ∈ F, and e x uncorrelated with v. Then, y = P ( e x + bv), which is a special case of Assumption 2. is the space of all L p -transformations for a given (H, e). The matrix F is obtained by concatenating F α ∈ F n×1 . This is done by first listing all the F α corresponding to "degree one" monomials in lexicographic order, then "degree 2" in lexicographic order, etc. That is, where F bi 1 ...i k denotes the column vector of F corresponding to α = b i1 + · · · + b i k , b i : canonical basis. For m = 2, p = 3 from Section 5.2, With this new "polynomial lifting" definition, one can transform a given LDIM with correlated noise sources to a transformed dynamical influence model (DIM) with latent nodes. As shown in the following lemma, transformation of correlations to uncorrelated latent nodes from Section 4 is replaced with uncorrelated latent clusters here. For any cluster c, Ch(c) := {Ch(i) : i ∈ c}, that is, Ch(c) denotes the union of the children of the nodes present in cluster c. Let (H, e) be an LDIM that satisfy Assumption 2, and let G c (V, E c ) be the correlation graph of {e i } n i=1 . Then, for every distinct i, j ∈ [n], (i, j) ∈ E c if and only if for any DIM ( H, y) ∈ L p (H, e), the corresponding dynamic influence graph (DIG) contains a cluster c such that i, j ∈ Ch(c). Proof: Refer Appendix F. 2 The following theorem shows that a subgraph, where E ,i := {(k, j) : k, j ∈ Ch(c i )}, in ( H, y). In particular, for any cluster c in the LDIG of ( H, y), Ch(h) forms a clique in G c (V, E c ). Proof: The proof follows similar to the proof of Theorem 3, by replacing latent nodes with latent cluster, as in the proof of Lemma 12. 2 As shown in Fig. 3d , if all the three nodes are correlated in 3a, one can transform this LDIM to a DIM with latent node 4. However, here node 4 should capture the interactions from the latent nodes y 1 − y 10 . Therefore, during reconstruction, one should accommodate non-linear interaction between the latent node and the observed nodes. In the next section, we describe a way to perform the reconstruction, indeed, when G c is unavailable. In the previous sections, frameworks that convert a network with spatially correlated noise to a network with latent nodes were studied. In this section, a technique is provided to reconstruct the topology of the transformed (L)DIMs using the sparse low-rank matrix decomposition of the IPSDM obtained from the observed timeseries. Note that this result does not require any extra information other than the IPSDM obtained from the true LDIM. Here, reconstruction under affine correlation is discussed. Recall from Definition 2 that e = e o + F e h . Then, PSDM of e, Φ e , can be written as [26] : where H) . Equality (a) follows from the matrix inversion lemma [19] . (12) can then be rewritten as: , which is similar to the model in [44] . If the moral-graph of the original LDIG is sparse and Q c n, then S is sparse and L is low-rank, and by applying the algorithms in [44] , one can retrieve the true moral-graph/topology from the support of S. Under polynomial correlation, recall from Definition 7 that e(z) = e o (z) + Fỹ. Let x h = e h =ỹ and let where, F ij captures the influence ofỹ j on x i and Φỹ is block diagonal. The topology of the sub-graph restricted to the observed nodes in the above DIM and the true topology of the network are equivalent. Moreover, similar to (12) (see [44] for details), one can obtain (13) to (15) exactly. Here S(z), L(z) ∈ C n×n , Ψ ∈ C M ×n , and Λ ∈ C M ×M . If M n, then L is low-rank. Let J := {j ∈ [M ] | ∃i ∈ [n] with F ij = 0} be the set of monomials that has non zero contribution to e i , i ∈ [n] and let L = |J |. Under this scenario, L is low-rank if L n. Following the procedure in [44] , moral-graph/topology reconstruction from imaginary of the IPSDM is considered here. The following optimization program modified from [6] is used to obtain the sparse lowrank decomposition. where C = {Φ −1 o (z)}, for some z ∈ C, |z| = 1 [44] . Suppose that we are given a matrix C, which is the sum of a sparse matrixS ∈ S n and a low-rank matrixL ∈ S n . Then, a necessary and sufficient condition for the unique identifiability of (S,L) is, [44] : i.e., the spaces Ω(S) and T (L) intersect transversely. This roughly translates toS (or I {H =0} ) being sparse and the number of cliques, Q c , being small, with clique sizes not too small (the readers can find more details on this in [44] ). The following are some sufficient (not necessary) assumptions from [44] , that are assumed for the exact recovery of the topology. Additionally, applying the algorithms in [44] , the correlation graph also can be reconstructed, if the LDIM satisfies the following assumption, as shown in Section 7. Assumption 5 For every distinct latent nodes k h , k h in the transformed dynamic graph, the distance between k h and k h is at least four hops. The following metrics are used to measure the accuracy of the estimates ( S t , L t ) in the optimization problem (17) . where . F denotes the Frobenius norm and > 0 is a sufficiently small fixed constant. Note that tol t requires the knowledge of the true matrices S and L, whereas dif f t does not. In this section, we demonstrate topology reconstruction of an LDIM (H, e) with Φ e non-diagonal, from Φ −1 x , using the sparse+low-rank decomposition technique discussed in Section 6 for an affinely correlated network. Fig. 4a Simulations are done in Matlab. Yalmip [22] with SDP solver [42] is used to solve the convex program (17) . For the simulation, we assume we have access to the true PSDM, Φ x , of the LDIM of Fig. 4a . Here, Φ e is non-diagonal with the (unkown) correlation structure as shown in Fig. 4c . For the reconstruction, the imaginary part of the IPSDM, C = {Φ −1 x (z)} is employed in the convex optimization (17) for z = 3π/8. Optimization (17) is solved for all the values of t ∈ [ , 1], with the interval = 0.01. Notice that for t = 0 ( S t , L t ) = (C, 0). Fig. 5 shows the comparison of tol t and diff t versus t. ( S t , L t ) for t = 0.36 is picked, which belongs to the middle zero region of diff t as described in [44] . returned the exact topology of Fig. 4b . From I { Lt =0} , by following Algorithms 2 and 3 in [44] , G c (V, E c ) also is reconstructed, which matches Fig. 4c exactly. In this article, the problem of reconstructing the topology of an LDIM with spatially correlated noise sources was studied. First, assuming affine correlation and the knowledge of correlation graph, the given LDIM was transformed into an LDIM with latent nodes, where the latent nodes were characterized using the correlation graph and all the nodes were excited by uncorrelated noises. For polynomial correlation, a generalization of the affine correlation, the latent nodes in the transformed LDIMs were excited using clusters of noise, where the noise clusters were uncorrelated with each other. Finally, using a sparse low-rank matrix decomposition technique, the exact topology of the LDIM was reconstructed, solely from the IPSDM of the true LDIM, when the network satisfied a sufficient condition required for the matrix decomposition. , since e i and e j are uncorrelated for i, j ∈ [n + L]. Thus, k F ik F * jk Φ e h k = 0, which implies that there exists a k such that F ik = 0 and F jk = 0. In other words, there exists a latent node h k in the corresponding LDIG of ([H, F], e) such that i, j ∈ Ch(h k ). , which implies that F ik = 0 or F jk = 0, ∀k, except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any latent node h such that both i, j ∈ Ch(h). 2 Let V ⊆ [n], |V l | > 1, such that V l forms a clique in G c . Lemma 2 showed that, for any i, j ∈ V l , there exists a latent node h such that i, j ∈ Ch(h) in the LDIG, G, of ( H, e). Since this is true for any pair i, j ∈ V l , there exists a minimal set of latent nodes H l := {h l i } k l i=1 , k l ∈ N\{0}, s. t. for any i, j ∈ V l , we have i, j ∈ Ch(h l p ) for some h l p ∈ H, in G. Hence, Next, we prove that all the children of a given latent node belong to a single (maximal) clique, which proves Let h ∈ H l be a latent node in the LDIG of ( H, e) and suppose i, j ∈ Ch(h). Then, from the definition of ( H, e), there exists such that F i = 0 and F j = 0, and hence Φ eiej = F i Φ e h F * j = 0 a.e. Thus, (i, j) ∈ E c , excluding the pathological cases. Notice that this is true for any i, j ∈ Ch(h). Therefore, Ch(h) ⊆ V forms a clique (not necessarily maximal) in G c (V, E c ). Since this is true for every h ∈ H l , V l ⊇ The pigeonhole principle states that if n pigeons are put into m pigeonholes, with n > m, then at least one hole must contain more than one pigeon. Proof of Lemma 5: We use pigeonhole principle and Theorem 3 to prove this via contrapositive argument. Recall that L = Q c . Suppose there exists a clique G l (V l , E l ) such that, for some i, j ∈ V l , there does not exist a latent node h ∈ ( H, e) such that i, j ∈ Ch(h). By Theorem 3, there exist latent nodes h, h such that i ∈ Ch(h) and j ∈ Ch(h ). By Theorem 3 again, all the children of h are included in a single clique. That is, Ch(h) ⊆ V l and Ch(h ) ⊆ V l , since i, j ∈ V l . Then, excluding V l , h, and h , we are left with Q c − 1 cliques and L − 2 = Q c − 2 latent nodes. Applying pigeonhole principle, with Q c − 1 pigeons (cliques) and L − 2 holes (latent nodes), there would exist at least one latent node k with Ch(k) belonging to two different maximal cliques, which is a contradiction of Theorem 3. 2 Let T 1 , T 2 ∈ {0, 1} (n+L)×(n+L) , T 1 = T 2 , be the topologies of two distinct transformations ( H 1 , e 1 ) and ( H 2 , e 2 ) respectively. Without loss of generality, let (i, j) be such that [T 1 ] ij = 0 and [T 2 ] ij = 0. By the design of G t , if i ≤ n and j ≤ n, then [T 1 ] ij = [T 2 ] ij = I {Hij =0} . If i ≤ n and j ≥ n, then i is an observed node and j is a latent node. Again, from the design of G t , F 1 ij = F 2 ij = 0 if and only if j / ∈ P a(i). Thus, i / ∈ G j c from T 2 and i ∈ G j c from T 1 which is a contradiction, since both cannot be true. Similar contradiction holds if i ≥ n and j ≤ n. If i, j > n, then [T 1 ] ij = [T 2 ] ij = 0 since P a(i) = P a(j) = ∅. Thus, the assumption leads to a contradiction, which implies that T 1 = T 2 . Since e i , e j , and e h are uncorrelated, and Φ y is block diagonal (Proposition 11), Thus, there exists a c such that F ik1 = 0 and F jk2 = 0 for some k 1 , k 2 ∈ C c . That is, there exists a cluster c such that i, j ∈ Ch(c). That is, F ik = F jk = 0, ∀k ∈ C c , ∀c ∈ [2 m ], except for a few pathological cases that occur with Lebesgue measure zero. We ignore the pathological cases here. Hence, there does not exist any cluster c such that both i, j ∈ Ch(c). Identification of sparse reciprocal graphical models Arma identification of graphical models The book of GENESIS: exploring realistic neural models with the GEneral NEural SImulation System Financial dynamical systems. Differential Geometry-Dynamical Systems Graphbased learning under perturbations via total least-squares Rank-sparsity incoherence for matrix decomposition Latent variable graphical model selection via convex optimization Robust identification of "sparse plus low-rank" graphical models: An optimization approach Factor models with real data: A robust estimation of the number of factors Learning latent variable dynamic graphical models by confidence sets selection Ideals, Varieties, and Algorithms-An Introduction to Computational Algebraic Geometry and Commutative Algebra Learning ar factor models Granger-causality meets causal inference in graphical models: Learning networks via non-invasive observations A control theoretic look at granger causality: extending topology reconstruction to networks with direct feedthroughs Van den Hof. Identifiability of linear dynamic networks through switching modules A robust approach to arma factor modeling Van den Hof. A scalable multi-step least squares method for network identification with unknown disturbance topology Advanced spectral methods for climatic time series Matrix Analysis Modeling the topology of a dynamical network via wiener filtering approach Sparse plus low-rank autoregressive identification in neuroimaging time series Yalmip : a toolbox for modeling and optimization in matlab Optimal allocation of excitation and measurement for identification of dynamic networks Topological identification in networks of dynamical systems Network reconstruction of dynamical polytrees with unobserved nodes On the problem of reconstructing an unknown topology via locality properties of the wiener filter Signal selection for estimation and identification in networks of dynamic systems: A graphical model approach Online non-linear topology identification from graph-connected time series Random Feature Approximation for Online Nonlinear Graph Topology Identification Distributed apportioning in a power network for providing demand response services Directed information graphs A local direct method for module identification in dynamic networks with correlated noise Blind learning of tree network topologies in the presence of hidden nodes An algorithm to learn polytree networks with hidden nodes Topology identification of directed graphs via joint diagonalization of correlation matrices Single module identifiability in linear dynamic networks with partial excitation and measurement Topology selection in graphical models of autoregressive processes Network topology identification from corrupt data streams Topology learning of radial dynamical systems with latent nodes Physics informed topology learning in networks of linear dynamical systems Sdpt3-a matlab software package for semidefinite-quadratic-linear programming Identification of dynamic models in complex networks with prediction error methods-basic methods for consistent module estimates Topology learning of linear dynamical systems with latent nodes using matrix decomposition Power generation, operation, and control Ar identification of latent-variable graphical models Sparse plus low rank network identification: A nonparametric approach The authors acknowledge the support of NSF for supporting this research through the project titled "RAPID: COVID-19 Transmission Network Reconstruction from Time-Series Data" under Award Number 2030096.