key: cord-0486871-1f9yhn0d
authors: Athreya, Avanti; Lubberts, Zachary; Park, Youngser; Priebe, Carey E
title: Discovering underlying dynamics in time series of networks
date: 2022-05-13
journal: nan
DOI: nan
sha: 1860a6369da6de4ae62b99217190c609567c6c72
doc_id: 486871
cord_uid: 1f9yhn0d

Understanding dramatic changes in the evolution of networks is central to statistical network inference, the importance of which has been underscored by the challenges of predicting and distinguishing pandemic-induced transformations in organizational and communication networks. We address this problem of multi-sample network inference with a model in which each node has an associated time-varying low-dimensional latent position vector of feature data and connection probabilities are functions of these vectors. Under mild assumptions, the time-varying evolution of the constellation of latent positions exhibits low-dimensional manifold structure under a suitable notion of distance. This distance can then be approximated by a measure of separation between the networks themselves. We can then find consistent Euclidean representations for the underlying network structure, as characterized by these distances, at any given time. This permits the visualization of network evolution over time, which yields a flow in Euclidean space. Moreover, it transforms important network inference questions, such as change-point and anomaly detection, into a classically familiar setting and reveals underlying temporal dynamics of the network. We illustrate our methodology with real and synthetic data, and in both cases, are able to formulate and identify a change point, corresponding to a massive shift in pandemic policy, in a communication network of a large organization.

The structure of many organizational and communication networks underwent a dramatic shift during the disruption of the COVID-19 pandemic in 2020, a visualization of which can be seen in Figure 1 . Such transformations give rise to several important statistical questions: how to construct useful measures of dissimilarity across networks; how to estimate any such measure of dissimilarity from random network realizations; and how to identify the loci of change. Our goal in this paper is to build a robust methodology to model and infer important characteristics of network time series.

To this end, we focus on a class of time series of random networks. We define an intuitive distance between the evolution of certain random variables that govern the behavior of nodes in the networks and prove that this distance can be consistently estimated from the observed networks. When this distance is sufficiently similar to a Euclidean distance, multidimensional scaling extracts a curve in low-dimensional Euclidean space that mirrors the structure of the network dynamics. This permits a visualization of network evolution and identification of change points. Figure 2 is the result of an end-to-end case study using these techniques for a time series of communication networks in a large corporation in the months around the start of pandemic work-from-home protocols: see the dramatic change in both panels beginning in Spring 2020.

Analysis of multiple networks is a key emerging subdiscipline of network inference, with approaches ranging from joint spectral embedding [1, 5, 7, 8, 12, 14] , tensor decompositions [7, 10, 22] , least-squares methods [11, 15] , and multiscale methods via random walks on graphs [9] . Asymptotic properties of these methods, however, depend on particular model assumptions for how the networks evolve over time and relate to one another, and rigorous performance guarantees are challenging and limited in scope.

On the one hand, both single and multiple-network inference problems often have related objectives. For example, if data includes multiple network realizations from the same underlying model on the same set of aligned vertices, we may wish to effectively exploit these additional realizations for more accurate estimation of common network parameters-that is, use the replications in a multiple-network setting to refine parameter estimates that govern any single network in the collection. On the other hand, multiple network inference involves statistically distinct questions, such as identifying loci of change across networks or detecting anomalies in a time series of networks. conditions is Euclidean realizable; that is, the random variables at each time have a representation as points in R c for some dimension c, where the metric space distances between them are equal to the Euclidean distance between these points (see [4] for more on Euclidean realizability of dissimilarity matrices). This allows us to visualize the time evolution of this stochastic process as the image of a map from an interval into R c .

We use this idea to formulate a novel approach to network time series. We demonstrate methods for consistently estimating a Euclidean representation, or mirror, of the evolution of the latent position distributions from the observed networks. This mirror can reveal important underlying structure of the network dynamics, as we demonstrate in both simulated and real data, the latter of which is drawn from organizational and communication networks, revealing the change-point corresponding to the start of pandemic work-from-home orders.

We organize the paper as follows. In Section 2, we define our network model, our distance measure between pairs of networks in the time series, and precise notions of Euclidean realizability. We show that, under mild assumptions, we can associate the network evolution to a one-dimensional manifold that preserves key structure. We detail several concrete examples of network time-series models, exploring the geometric structure under various model assumptions. In Section 3, we prove that our distance measure can be consistently estimated from the observed networks, obtaining probabilistic concentration as the network size grows. We use this concentration to demonstrate that the Euclidean mirror may be consistently recovered through multidimensional scaling applied to the approximated distance matrix, producing a visualization that captures the true structure of the network evolution. We then test these methods in Section 4, applying our results to both simulated and real data examples. As a particular case study illustrating our approach, we consider a time series of communication networks for a large corporation during the start of pandemic work-from-home orders. In particular, we are able to clearly locate and visualize the pandemic shift. We discuss these results and possible future work in Section 5, leaving all proofs and some supplemental results for the appendices.

In order to model the intrinsic characteristics of the entities in our network, we consider latent position random graphs, which associate a vector of features in R d to each vertex in the network. The connections between vertices in the network are independent given the latent positions, with connection probabilities depending on the latent position vectors of the two vertices in question.

Definition 1 (Latent Position Graph, Random Dot Product Graph, and Generalized Random Dot Product). We say that the random graph G with adjacency matrix A ∈ R n×n is a latent position random graph (LPG) with latent position matrix X ∈ R n×d having rows X 1 , . . . , X n ∈ X ⊆ R d and link function κ :

When κ(x, y) = x y, we say that G is a random dot product graph (RDPG) and we call P = XX T the connection probability matrix. When κ([x 1 , x 2 ] , [y 1 , y 2 ] ) = x 1 y 1 − x 2 y 2 , we say that G is a generalized random dot product graph (GRDPG) and we call P = XI p,q X T the generalized edge connection probability matrix, where I p,q = I p ⊕ (−I q ).

Remark 1 (Orthogonal nonidentifiability in RDPGs). Note that if X ∈ R n×d is a matrix of latent positions and W ∈ R d×d is orthogonal, X and XW give rise to the same distribution over graphs. Thus, the RDPG model has a nonidentifiability up to orthogonal transformation. Analogously, the GRDPG model has a nonidentifiability up to indefinite orthogonal transformations.

Since we wish to model randomness in the underlying features of each vertex, we will consider latent positions that are themselves random variables defined on a probability space (Ω, F, P). For a particular sample point ω ∈ Ω, let X(t, ω) ∈ R d be the realization of the associated latent position for this vertex at time t ∈ [0, T ]. On the one hand, for fixed ω, as t varies, X(t, ω), 0 ≤ t ≤ T is the realized trajectory of a d-dimensional stochastic process. On the other hand, for a given time t, the random variable X(t, ·) represents the constellation of possible latent positions at this time. In order for the inner product to be a well-defined link function, we require that the distribution of X(t, ·) follow an inner-product distribution: Definition 2. Let F be a probability distribution on R d . We say that F is a d-dimensional inner product distribution if 0 ≤ x y ≤ 1 for all x, y ∈ supp F . We will suppose throughout this work that for a ddimensional inner product distribution F and X ∼ F , E[XX ] has rank d.

We wish to quantify the difference between the random vectors X(t, ·) and X(t , ·). Suppose that the graphs come from an RDPG or GRDPG model, where at each time t, the latent positions of each graph vertex are drawn independently from a common inner product latent position distribution F t . Because X(t, ·) is a latent position, we necessarily have X(t, ·) ∈ L 2 (Ω); for notational simplicity, we will use X(t, ·) and X t interchangeably. We define a norm, which we call the maximum directional variation norm, on this space of random variables; this norm leads to a natural metric d M V (X t , X t ), both of which are described below.

Definition 3 (Maximum directional variation norm and metric). For a random variable X ∈ L 2 (Ω), we define

where the maximization is over u ∈ R d with u = 1. We define an associated metric d M V by minimizing the norm of the difference between the random variables X t , X t over all orthogonal transformations, which aligns these distributions.

where the matrix norm on the right hand side is the spectral norm.

Note that the norm · M V only captures the directional variation when X is centered. In the cases of interest, we wish to capture the drift in the latent position, X t − W X t : this is the origin of the name for this metric and its associated norm.

Remark 2. The metric d M V is not properly a metric on L 2 (Ω), since if X = W Y a.s. for some orthogonal matrix W , then d M V (X, Y ) = 0. However, if we consider the equivalence relation defined by X ∼ Y whenever X = W Y for some orthogonal matrix, this is a metric on the corresponding set of equivalence classes. This means that we are able to absorb the non-identifiability from the original parameterization, obtaining a new parameter space with a metric space structure where the underlying distribution is identifiable.

One of the central contributions of this paper is that d M V captures important features of the time-varying distributions F t . To describe a family of networks indexed by time, each of which is generated by a matrix of latent positions that are themselves random, we consider a latent position stochastic process.

Definition 4 (Latent position process). Let F t , 0 ≤ t ≤ T be a filtration of F. A latent position process ϕ(t) is an F t -adapted map ϕ : [0, T ] → (L 2 (Ω), d M V ) such that for each t ∈ [0, T ], ϕ(t) = X(t, ·) has an inner product distribution. We say that a latent position process is nonbacktracking if

Once we have the latent position stochastic process, we can construct a time series of latent position random networks whose vertices have independent, identically distributed latent positions given by ϕ(t).

Definition 5 (Time Series of LPGs). Let ϕ be a latent position process, and fix a given number of vertices n and collection of times T ⊆ [0, T ]. We draw an i.i.d. sample ω j ∈ Ω for 1 ≤ j ≤ n, and obtain the latent position matrices X t ∈ R n×d for t ∈ T by appending the rows X(t, ω j ), 1 ≤ j ≤ n. The time series of LPGs (TSG) {G t : t ∈ T } are conditionally independent LPGs with latent position matrices X t , t ∈ T .

We emphasize that each vertex in the TSG corresponds to a single ω ∈ Ω, which induces dependence between the latent positions for that vertex across time points, but the latent position trajectories of any two distinct vertices are independent of one another across all times. Since these trajectories form an i.i.d. sample from the latent position process, it is natural to measure their evolution over time using the metric on the corresponding random variables, namely d M V (X t , X t ). In the definition of this distance, the expectation is over ω ∈ Ω, which means that it depends on the joint distribution of X t and X t . In particular, d M V depends on more than just the marginal distributions of the random vectors X t and X t individually, but takes into account their dependence inherited from the latent position process ϕ.

One key question is whether the image ϕ([0, T ]) has useful geometric structure when equipped with the metric d M V . It turns out that, under mild conditions, this image is a manifold. In addition, the map ϕ admits a Euclidean analogue, called a mirror, which is a finite-dimensional curve that retains important signal from the generating stochastic process for the network time series. To make this precise, we define the notions of Euclidean realizability and approximate Euclidean realizability, below, and provide several examples of latent position processes that satisfy these requirements.

Definition 6 (Notions of Euclidean realizability). Let ϕ be a latent position process.

We say that ϕ is approximately (Lipschitz) Euclidean c-realizable with mirror ψ if there exists a Lipschitz continuous curve ψ : [0, T ] → R c and some constant C > 0 such that

For a fixed α ∈ (0, 1), we say that ϕ is approximately α-Hölder Euclidean c-realizable if ψ is α-Hölder continuous, and there is some C > 0 such that

Rather than c-realizable, we may simply say realizable if there is some c for which this holds; we simply say that ϕ is Hölder Euclidean realizable if the condition holds for some α ∈ (0, 1].

Approximate Euclidean realizability is sufficient for the image of ϕ to be a manifold.

Theorem 1 (Manifold properties of a nonbacktracking latent position process). Let ϕ be a nonbacktracking latent position process which is approximately Euclidean realizable. Then ϕ([0, T ]) is homeomorphic to an interval [0, I]. In particular, it is a topological 1-manifold with boundary. If ϕ is injective and approximately α-Hölder Euclidean realizable, the same conclusion holds.

If we suppose that the trajectories of ϕ satisfy a certain degree of smoothness, it turns out that the function ϕ also has this degree of smoothness. That is, the sample-path smoothness implies smoothness of the map ϕ into the space of random variables, when equipped with our metric.

is Hölder continuous with this same α.

Remark 3. In the above definitions of realizability, regularity conditions are imposed on ψ, which takes values in R c , rather than on ϕ, which gives random variables as output. Moreover, ψ is the Euclidean realization of the manifold ϕ([0, T ]) in the space of random variables; this as an approximately distancepreserving representation of those random variables, each of which captures the full state of the system with all of the given entities at any time t. As we show, estimates of this Euclidean mirror, derived from observations of graph connectivity structure at a collection of time points, can recover important features of the time-varying latent positions.

There are several natural classes of latent position processes that are approximately Lipschitz or α-Hölder Euclidean realizable. The next theorem demonstrates approximate α-Hölder Euclidean realizability for any latent position process expressible as the sum of a deterministic drift and a martingale term whose increments have well-controlled variance.

Theorem 3 (Approximate Holder realizability of variance-controlled martingale-plus-drift processes). Suppose M t is a F t -martingale with respect to the filtration {F t : 0 ≤ t ≤ T }, and suppose γ :

When M t satisfies Cov(M t − M s ) 2 ≤ C(t − s), and γ(t) = a(t)v for some v ∈ R d and Lipschitz continuous a : [0, T ] → R, then ϕ is approximately α-Hölder Euclidean realizable with α = 1/2 and c = 1.

The latent positions for the vertices in our network are not typically observed-instead, we only see the connectivity between the nodes in the network, from which a given realization of the latent positions can, under certain model assumptions, be accurately estimated. In order to compare the networks at times t and t , we can consider estimates of the networks' latent positions at these two times as noisy observations from the joint distribution of (X t , X t ), and deploy these estimates in an approximation of the distances d M V (X t , X t ). Using these approximate distances, we can then estimate the curve ψ(t), giving a visualization for the evolution of all of the latent positions in the random graphs over time.

Suppose that G is a random dot product graph with latent position matrix X, where the rows of X are independent, identically distributed draws from a latent position distribution F on R d . Let A be the adjacency matrix for this graph. As shown in [17] , a spectral decomposition of the adjacency matrix yields consistent estimates for the underlying matrix of latent positions. We introduce the following definition.

Definition 7 (Adjacency Spectral Embedding). Given an adjacency matrix A, we define the adjacency spectral embedding with dimension d asX =ÛŜ 1/2 , whereÛ ∈ R n×d is the matrix of d top eigenvectors of A andŜ ∈ R d×d is the diagonal matrix with the d largest eigenvalues of A on the diagonal.

As we show in the next section, we will use the ASE of the observed adjacency matrices in our TSG to estimate the d M V distance between latent position random variables over time, and in turn, to estimate the Euclidean mirror, which records important underlying structure for the time series of networks.

Given a time series of graphs with approximately α-Hölder Euclidean realizable latent position process ϕ, our goal is to estimate the Euclidean mirror ψ. Since this curve belongs to some finite-dimensional Euclidean space, the distances ψ(t) − ψ(s) can be used to recover this mirror (up to rigid transformations) from classical multidimensional scaling (CMDS). As such, the crucial estimation problem is one of accurately estimating the distances d M V (ϕ(t), ϕ(s)).

To this end, we define the estimated pairwise distances between any two such n×d latent position matriceŝ X t andX s as follows:d

where O d×d is the set of real orthogonal matrices of order d, and · 2 denotes the spectral norm. Our central result is that, when our networks have a sufficiently large number of vertices n,d M V provides a consistent estimate of d M V .

Theorem 4. With overwhelming probability,

In order to characterize the evolution of time series of networks, consider a finite subset of times T ⊆ [0, T ] with |T | = m T , and define the matrices

These are two dissimilarity matrices; the first records the pairwise distances between the latent position random variables at times t and s; the second records the estimate for this distance based on the observed network realizations at times t and s. Suppose D (2) ψ is the matrix of squared entries of D ψ , and D (2) ψ is the matrix of squared entries of Dψ. We derive the following corollary:

It turns out that for large networks, the invariant subspace associated to D (2) ψ is an accurate approximation to the corresponding subspace of D (2) ψ . This suggests that applying CMDS to the estimated dissimilarity matrix Dψ can recover the original curve ψ(t) up to a rotation. When D ψ is a Euclidean distance matrix, then CMDS computes the scaled eigenvectors of the matrix − 1 2 P D (2) ψ P , where P = I − J m T /m T is a projection matrix, and J m T is the m T × m T matrix of all ones. Since this matrix may be written as U SU T for some m T × c matrix U with orthonormal columns and diagonal matrix S, this means that ψ(t i ) is simply the ith row of the matrix U S 1/2 . We will analogously denote the ith row ofÛŜ 1/2 , the output of CMDS applied to Dψ, byψ(t i ).

Theorem 5. Suppose D ψ is a Euclidean distance matrix. LetÛ , U ∈ R m T ×c be the top c eigenvectors, and S, S ∈ R c×c be the diagonal matrices with diagonal entries equal to the top c eigenvalues of − 1 2 P D

(2)

ψ P , respectively, where P = I − J m T /m T , and J m T is the all-ones matrix of order m T . Then there is a constant C such that with overwhelming probability, there is a real orthogonal matrix R ∈ O c×c such that

, and the CMDS output satisfies

.

In particular, we have

.

As such, if the important aspects of the latent position process, such as changepoints or anomalies, are reflected in low-dimensional Euclidean space, then we recover this mirror consistently through CMDS applied to the estimated distance matrix. We summarize these connections between true distances, their estimates, and associated Euclidean mirrors in Figure 3 .

Thm. 5 

We start with a more detailed discussion of the communication network example shown in Figure 2 . We consider a time series of weighted communication networks, arising from the email communications between 32277 entities in a large organization, with one network being generated each month from January 2019 to December 2020, a period of 24 months. This data was studied through the lens of modularity in [23] . We obtain a clustering by applying Leiden clustering [20] to the January 2019 network, obtaining 33 clusters that we retain throughout the two year period. We make use of this clustering to compute the Graph Encoder Embedding (GEE) of [16] , which produces spectrally-derived estimates of invertible transformations of the original latent positions. For each time t = 1, · · · , 24, we obtain a matrixẐ t ∈ R 32277×33 , each row of which provides an estimate of these transformed latent positions. Constructing the distance matrix Dψ = [d M V (Ẑ t ,Ẑ t )] ∈ R 24×24 , we apply CMDS to obtain the estimated curveψ shown in the left panel of Figure 2 , where the choice of dimension c = 2 is based on the scree plot of Dψ. The nonlinear dimensionality reduction technique ISOMAP [19] , which relies on a spectral decomposition of geodesic distances, can be applied to these points to extract an estimated 1-dimensional curve, which we plot against time in the right panel of Figure 2 . This one-dimensional curve exhibits some changes from the previous trend in Spring 2020 and a much sharper qualitative transformation in July 2020. What is striking is that both these qualitative shifts correspond to policy changes: in Spring 2020, there was an initial shift in operations, widely regarded at the time as temporary. In mid-summer 2020, nearly the peak of the second wave of of COVID-19, it was much clearer that these organizational shifts were likely permanent, or at least significantly longer-lived. For each subcommunity, we apply our methods to obtain an ISOMAP representation, yielding a single grey curve. The green and blue highlighted curves correspond to the communities highlighted in Figure 1 which exhibit constant drift over the two years (green) versus major shock during the pandemic (blue). The overall network ISOMAP embedding from Figure 2 is overlaid for context.

In Figure 4 , we plot the result of our methods applied to the induced subgraphs corresponding to each of the 33 communities; these are represented by the grey trajectories. The trajectories of two subcommunities have been highlighted in the plot above: the green curve shows a constant rate of change throughout the two-year period, and does not exhibit a noticeable pandemic effect. The blue curve, on the other hand, shows a significant flattening in early 2020, followed by rapid changes in summer. Thus, we can see a differential effect of the pandemic on different work groups within the organization. In Figure 5 , both panels show methods for identifying changepoints over the 24 months, with consistent results. In the left panel, for each time starting in June 2019, we plot the sigmage of its ISOMAP embedding relative to the previous 5 months: that is, we measure the distance of the ISOMAP embedding to the mean of the previous 5 months' embeddings, relative to the standard deviation of those embeddings. Note that since the computation of the sigmages require a window of time-points, we are only able to produce these estimates starting in June 2019. We see apparent outliers in March and April, and again in July-September 2020. The right panel shows the ISOMAP curve with a moving prediction confidence interval of width 5 standard deviations, generated from linear regression applied to the previous 5 time points (which is why we again only have an interval starting in June). This method indicates the same set of outliers as the previous one, but allows for some more detailed analysis: In March and April 2020, it appears that the behavior is anomalous because the network stopped drifting, while the behavior in the summer of that year is anomalous because it made a significant jump from its previous position.

In the previous section, we apply the GEE embedding to obtain estimatesẐ t , which are then used for estimates of pairwise distancesd M V (Ẑ t ,Ẑ s ). Although the GEE differs slightly from the adjacency spectral embedding, it is computationally more tractable and yields similarly useful output. To further illustrate our underlying theory, however, we consider synthetic data. That is, we use real data to obtain a distribution from which we may resample. Such a network bootstrap permits us to test our asymptotic results through replicable simulations that are grounded in actual data. To this end, we consider the true latent position distribution at each time to be equally likely to be any row of the GEE-obtained estimates from the real data, Z t ∈ R 32277×33 , for t = 1, . . . , 24. Given a sample size n s , for each time, we sample these rows uniformly and with replacement to get a matrix of latent positions X t ∈ R ns×33 . We treat this matrix as the generating latent position matrix for independent adjacency matrices A t ∼ RDPG(X t ). Note that if for sample i, we choose row j ofẐ 1 at time t = 1, then the same row j ofẐ t will be used for all times t = 1, . . . , 24 for that sample, so that the original dependence structure is preserved. We may now apply the methods described in our theorems, namely ASE of the adjacency matrices followed by Procrustes alignment, to obtain the estimatesX t , along with the associated distance estimates. In Figure 6 , we see that ISOMAP applied to the CMDS embedding of the bootstrapped data converges to the original ISOMAP curve, as predicted by our theorems.

We also wish to check whether this procedure consistently demonstrates the pandemic effect. In Figure 7 , we show the sigmages for each month, plotted over 100 replicates of this experiment, with n s = 30000 for each replicate. The pandemic effect in summer of 2020 is clearly visible in all but a few replicates, while the effect in March-April is still identified in the majority of replicates. We also observe the dramatic change in variance for certain months, over the different replicates: this might indicate the discrepancies between the pandemic effect on different network entities, rendering the final estimate much more sensitive to the sample of rows used to generate the network.

To effectively model time series of networks, it is natural to consider network evolution governed by underlying low-dimensional dynamics. In this paper, we examine latent position networks in which the vertex latent positions follow a stochastic process. Under mild conditions, we can associate to this stochastic process certain geometric structure, and understanding how that structure changes with time allows us to identify transformations in network behavior across multiple scales.

To make this intuition precise, we define the maximum directional variation norm and metric on the space of random latent positions. We describe notions of Euclidean realizability for this metric, characterizing how closely this metric can be approximated by a Euclidean distance metric. If the latent position process is sufficiently regular, its range is a one-dimensional manifold, and we detail several useful examples of such processes.

Of course, the latent position process is typically unobserved; what we have instead is a time series of networks from which these latent positions must be estimated. One of our key results is that the pairwise dissimilarity matrix of maximum directional variation distances between latent positions X t and X t at pairs of time points can be consistently estimated by spectrally embedding the network adjacencies at these different pairs of times and computing spectral norm distances between these embeddings. When the latent position process is such that the maximum directional variation metric between any pair of latent positions X t , X t is approximately Euclidean realizable, we find that classical multidimensional scaling applied to the estimated distances gives us an inferentially valuable low-dimensional representation of network dissimilarities across time. Further dimension reduction techniques, such as ISOMAP, can further clarify changes in network dynamics.

To this last point, ISOMAP is a manifold-learning algorithm; detailed analysis of its effect on embeddings of estimated pairwise network distances can bring us closer to provable guarantees for change-point detection. More broadly, the interplay between the probabilistic structure of the underlying latent position process and the geometric structure of the Euclidean mirror is a key component of the estimated Euclidean representation of relationships between networks across time.

We consider two estimates for the maximum directional variation distance d M V , namely the spectral norm applied to the GEE estimate, and the Procrustes-aligned ASEs. However, these are far from the only options, and it is an open question whether d M V or another metric on the space of random variables is best for downstream inference tasks under certain model assumptions. It is also an open question whether there is a better estimate for the distance d M V itself, either in terms of computational complexity or statistical properties. Of particular interest is the spectral norm distance applied to the omnibus embeddings [12] for the adjacency matrices: this likely converges to another distance on the space of random variables, potentially highlighting different features in the final CMDS embedding. Results quantifying the distribution of the errors in the CMDS embedding are key to formulating hypothesis tests for changepoint detection.

The perspective described in Figure 3 , which connects distance metrics for generative processes of networks to their estimates, and in turn translates manifold geometry into Euclidean geometry, is an important theoretical contribution to time series analysis for networks. It provides mathematical formalism for network dynamics; asymptotic properties of estimates of manifold structure; and rigorous guarantees on when timevarying networks can be well-represented in low-dimensional space. Our family of latent position networks are interpretable, estimable, and flexible enough to capture the important features of diverse real-world network time series. As such, this canonical framework invites and accommodates future approaches to joint network inference.

A Proofs and supporting results for Section 2 Lemma 1. The function d M V (X, Y ) is a metric on the space of random variables, up to the equivalence relation where X ∼ Y if there is some W ∈ O d×d such that X = W Y almost surely.

Clearly, this is symmetric and nonnegative. The triangle inequality holds, since for any Q ∈ O d×d and Z ∈ L 2 (Ω), we have

by the Cauchy-Schwartz inequality applied to the L 2 (Ω)-inner product. This is further bounded as

the latter term is just d M V (Z, Y ). Since this upper bound holds for any Q ∈ O d×d , it must also hold for the minimizer. Now suppose that d M V (X, Y ) = 0. Since the spectral norm is a norm, this tells us that for the W ∈ O d×d achieving the minimum, E[(X − W Y )(X − W Y ) ] = 0. Since (X − W Y )(X − W Y ) is positive semidefinite for every ω ∈ Ω, this implies that X = W Y almost surely.

t ], and that (M, d M V ) is approximately Euclidean realizable. Recall that Theorem 1 states that under these conditions, M is homeomorphic to an interval [0, I]. In particular, it is a topological 1-manifold with boundary. If ϕ is injective, M remains a 1-manifold with boundary even when ϕ is only α-Hölder Euclidean realizable.

Proof of Theorem 1. We consider the case where ϕ is injective first, since this avoids some of the technical details of the more general case. We may first observe that

so ϕ is also α-Hölder continuous (or Lipschitz if α = 1). Since M is defined to be ϕ([0, T ]), it is apparent that ϕ is bijective. Now any closed subset F ⊆ [0, T ] is compact, so ϕ(F ) ⊆ M is compact, hence closed in M, and ϕ is a closed map. In other words, ϕ −1 is continuous, so ϕ is itself the required homeomorphism.

In the case that ϕ is nonbacktracking and α = 1, we define L(t, δ) = sup{d M V (ϕ(x), ϕ(y))/|x − y| : x, y ∈ B(t, δ), x = y} for any t ∈ (0, T ) and δ > 0 such that B(t, δ) ⊆ [0, T ]. This is finite and bounded above by L + c from the first part of the proof. We also define L(t) = inf{L(t, δ) : δ > 0}, which is a finite, nonnegative number bounded above by L + c. We make the following observations, which are easily proved: (1) L(t, δ) is lower semicontinuous in t, for any δ > 0; (2) 

Since γ is surjective, this allows us to define µ : [0, I] → M via µ(γ(t)) = ϕ(t) for all t ∈ [0, T ]. We now show that µ is well-defined and Lipschitz continuous. Let t < t ∈ [0, T ], and given δ > 0, choose points s i such that s 0 = t, s k = t , and s i < s i+1 < s i + δ for each i. Now we observe that

Letting the partition {s i } of [t, t ] become arbitrarily fine, we see that this upper bound converges to the corresponding integral, giving d M V (µ(γ(t)), µ(γ(t ))) ≤ so ϕ(s) = ϕ(t) for all s ∈ [t, t ]. Then for s ∈ (t, t ), we can take δ > 0 small enough that B(s, δ) ⊆ (t, t ), and since d M V (ϕ(a), ϕ(b)) = 0 for all a, b ∈ B(s, δ), we see that L(s, δ) = 0, and thus L(s) = 0, too. Now from the definition of γ,

which contradicts the assumption that x < y.

Since µ : [0, I] → M is a continuous bijection, it is easy to see that µ is in fact a homeomorphism. When the trajectories X(·, ω) : [0, T ] → R d satisfy a Hölder condition with square-integrable constant, then ϕ also satisfies this continuity condition, as Theorem 2 states.

Proof of Theorem 2: To show that sufficiently smooth trajectories imply continuity for φ, note that

Additional constraints on the probabilistic structure of the stochastic process can render the distance d M V simpler to compute. In Theorem 3, we show that if ϕ(t) = γ(t) + M t , where γ : [0, T ] → R d is Lipschitz continuous and M t is a martingale with certain variance constraints, then ϕ is approximately α-Hölder Euclidean realizable.

Proof of Theorem 3: Suppose

Since the increment M t − M s is conditionally mean-zero given F s , and M s has mean zero, all cross terms vanish when we take the expected value. This leaves

Plugging in W = I and using the triangle inequality guarantees that

Since γ(s) ∈ span(γ(t)), we use the fact that the first and last terms are positive semidefinite to obtain the lower bound ( γ(t) − γ(s) ) 2 = γ(t) − γ(s) 2 , so

which completes the proof.

We now demonstrate the properties of the stochastic processes describe in Examples 1 and 2.

Proof for Example 1:

We may expand I t − W I t as I t − I t + (I − W )I t . Using the fact that E[B s |F s ] = B s whenever s > s , we observe that

Similarly,

The latter term is positive semidefinite, so the spectral norm of this matrix is minimized at W = I. For the term γ(t) − W γ(t ) = (at + b)v − (at + b)W v, by Cauchy-Schwarz, we have

where the lower bound is achieved with W = I since α(t) and α(t ) are linearly dependent. In Equation 3, we obtain a lower bound by discarding the second term of Equation 4. Since the remaining portion of Equation 4 is just a multiple of the identity, we use the identity (for β > 0) XX + βI 2 = XX 2 + β to finally obtain the equality

We see that ϕ is approximately Euclidean realizable with ψ(t) = a 2 v 2 + σ 2 T t, since

and thus using a 2 − b 2 = (a − b)(a + b), we get

Note that I t is not a martingale, but as this example demonstrates, the stochastic term need not be. Moreover, the increased regularity of integrated Brownian motion guarantees approximate Lipschitz Euclidean realizability. If we consider processes expressible as the sum of a deterministic drift and standard Brownian motion, we retain α-Hölder Euclidean realizability, as Example 2 asserts.

Proof for Example 2: We may proceed as in the proof of Theorem 3, obtaining

Since B t − B t and B t are independent and have mean 0, we see that

Since the last term inside the norm is positive semidefinite, we obtain a lower bound given by

Arguing as in Example 1, we see that this is minimized at W = I, proving that

Consider

As before, this gives 

We will make use of the following supporting lemmas in our proof of Theorem 4. The first says that the property of equicontinuity for functions is preserved under convex combinations; we omit the straightforward proof.

Lemma 2. Let (A, d) be a metric spaces, and let C be a collection of functions f : A → R such that for any > 0, there exists δ( ) > 0 such that for all f ∈ C,

That is, C is an equicontinuous family of functions. Then the convex hull of C, conv(C), is an equicontinuous family of functions with this same δ( ). That is, given f 1 , . . . , f n ∈ C, λ 1 , . . . , λ n ≥ 0 with

is uniformly continuous with the same modulus of continuity (at most) δ( ) for all > 0.

The following lemma states that when two functions are uniformly close, their maximum and minimum values must also be close. Since the mean edge probability matrices of latent position graphs have low-rank structure, the underlying latent position matrices can be well estimated by the adjacency spectral embedding [17, 2] . In particular, the difference between the Procrustes-aligned adjacency spectral embeddings of two independent networks satisfies the following concentration [3, 18] : Lemma 4. Let F be an inner product distribution in R d , and suppose X 1 , . . . , X n ∼ F are an i.i.d. sample. Suppose that E[X 1 X 1 ] has rank d. If A is the adjacency matrix of an RDPG with latent position matrix X having rows X i , 1 ≤ i ≤ n, andX is its d-dimensional ASE, then there is a constant C such that with overwhelming probability, min

We now turn to our main result, Theorem 4, which says that with overwhelming probability, d M V (X t ,X s ) and d M V (X t , X s ) are close. A crucial step is the proof of a concentration inequality for the scaled distance between the realized latent position matrices X t and X s and the maximum directional variation metric between X t and X s , whose joint distribution is inherited from the latent position process ϕ. For a given W ∈ O d×d and u ∈ R d with u = 1, classical results show that 1 n (X t − X s W )u 2 concentrates around E[ X t − W X s , u 2 ]. The challenge, however,is the maximization over u and minimization over W . This necessitates a uniform concentration bound in W, u, which relies on a carefully-constructed cover for the compact set O d×d × S d . We show this pointwise concentration can be extended uniformly over small neighborhoods of a given (W, u), after which a union bound gives the desired result.

Proof of Theorem 4: Consider the matrices of true latent positions, X t and X s , and denote the rows of these matrices as (X i , Y i ), i = 1, . . . , n. These rows these are an i.i.d. sample from some latent position distribution F . We first show that with overwhelming probability, we have the bound

From the definition ofd M V , we have

Defining f ((x, y), W, u) = x − W y, u 2 , and d((W, u), (W , u )) = max{ W − W 2 , u − u } it is easy to show that |f ((x, y), W, u) − f ((x, y), W , u )| ≤ 12d((W, u), (W , u )).

Z n (ω, W, u) = 1 n n i=1 X i (ω) − W Y i (ω), u 2 ∈ conv({f ((x, y), ·, ·) : x ∈ supp(F t ), y ∈ supp(F s )}).

If we define µ(W, u) = E[Z n (·, W, u)], then by Lemma 2, {Z n (ω, ·, ·) − µ(·, ·) : n ≥ 1, ω ∈ Ω} is an equicontinuous family of functions in (W, u) with modulus of continuity δ( ) ≥ /24. Let (W, u) be fixed, and consider N = N W,u = {(W , u ) : d((W , u ), (W, u)) < δ( /2)}. If |Z n (ω, W, u) − µ(W, u)| < /2, then for (W , u ) ∈ N , |Z n (ω, W , u ) − µ(W , u )| ≤ |Z n (ω, W , u ) − µ(W , u ) − (Z n (ω, W, u) − µ(W, u))| + |Z n (ω, W, u) − µ(W, u)| < /2 + /2 = .

So we have the following containment: Then for given γ > 0, this probability is ≤ γ for n ≥ (136/ 2 ) log(2/γ), independent of the particular choice (W, u). The previous containment says that for this same n, we also have P[A c n,N , ] ≤ γ.

By compactness of O d×d ×S d , we may extract a finite subcover {N i } k i=1 from the open cover {N W,u } (W,u) . In fact, we may take k ≤ (4d 3/4 /δ( /2)) 2d , and since δ( ) ≥ /24, this gives k ≤ (192d 3/4 / ) 2d .

Fix , γ > 0. If we let N = max{N i ( , γ/k) : 1 ≤ i ≤ k}, where N i ( , γ/k) ensures that the inequality (6) holds for the neighborhood N i (with γ/k on the right hand side), then in the worst case, we have N = (136/ 2 ) log(2k/γ), which from the bound on k gives N ≤ (136/ 2 )(log(2/γ) + (2d) log(192d 3/4 / )). But for n ≥ N , we have 

In particular, taking = log(n)/ √ n, we may take γ = n −c for any c > 0 and find that for n sufficiently large, the inequality (7) holds, since N = o(n). Now given ω for which |Z n (ω, W, u) − µ(W, u)| < for all (W, u), we see from Lemma 3 that max u Z n (ω, W, u) − max u µ(W, u ) < .

We may now regard max u Z n (ω, W, u) as a function only of W , and similarly for µ, so applying the lemma again yields min W max u Z n (ω, W, u) − min W max u µ(W , u ) < .

From Inequality 7, we obtain the desired

Now by Lemma 4, with overwhelming probability, the ASEs for the corresponding adjacency matrices satisfy for r ∈ {t, s}d M V (X r , X r ) ≤ C r / √ n + O(log(n)/n).

Since both terms are no larger than constant order (so certainly less than log(n)), we can use |a 2 − b 2 | = |a − b||a + b| to show that

Combining this bound with Equation 5 completes the proof. Proof of Theorem 5: Applying Theorem 4 to each entry of the matrix Dψ and taking a union bound, we see that with overwhelming probability,

Since D ψ is a Euclidean distance matrix with rank c, λ c+1 (D ψ ) = 0. Then by [21] , we have

The bound for scaled distances follows as in [13] , using the fact that U (− 1 2 P (D

Inference for multiple heterogeneous networks with a common invariant subspace

Statistical inference on random dot product graphs: a survey

Numerical tolerance for spectral decompositions of random matrices

Modern multidimensional scaling: Theory and applications

Spectral embedding for dynamic networks with stability guarantees

Latent space approaches to social network analysis

Community detection on mixture multilayer networks via regularized tensor decomposition

The multilayer random dot product graph

Multiscale analysis of time series of graphs

Consistent community detection in multi-layer network data

Bias-adjusted spectral clustering in multi-layer stochastic block models

A central limit theorem for an omnibus embedding of random dot product graphs

Community detection and classification in hierarchical stochastic blockmodels

The importance of being correlated: Implications of dependence in joint spectral inference across multiple networks

Dynamic network models and graphon estimation

Graph encoder embedding. CoRR, abs

A consistent adjacency spectral embedding for stochastic blockmodel graphs

A semiparametric two-sample hypothesis testing problem for random dot product graphs

A global geometric framework for nonlinear dimensionality reduction

From louvain to leiden: guaranteeing wellconnected communities

A useful variant of the Davis-Kahan theorem for statisticians

Tensor svd: Statistical and computational limits

Dynamic silos: Increased modularity in intra-organizational communication networks during the covid-19 pandemic