key: cord-0535875-or4w8khv
authors: Zhou, Yidong; Muller, Hans-Georg
title: Dynamic Network Regression
date: 2021-09-07
journal: nan
DOI: nan
sha: ea695294215a4b765bcc143378043d651e6edb5d
doc_id: 535875
cord_uid: or4w8khv

Network data are increasingly available in various research fields, motivating statistical analysis for populations of networks where a network as a whole is viewed as a data point. Due to the non-Euclidean nature of networks, basic statistical tools available for scalar and vector data are no longer applicable when one aims to relate networks as outcomes to Euclidean covariates, while the study of how a network changes in dependence on covariates is often of paramount interest. This motivates to extend the notion of regression to the case of responses that are network data. Here we propose to adopt conditional Fr'{e}chet means implemented with both global least squares regression and local weighted least squares smoothing, extending the Fr'{e}chet regression concept to networks that are quantified by their graph Laplacians. The challenge is to characterize the space of graph Laplacians so as to justify the application of Fr'{e}chet regression. This characterization then leads to asymptotic rates of convergence for the corresponding M-estimators by applying empirical process methods. We demonstrate the usefulness and good practical performance of the proposed framework with simulations and with network data arising from NYC taxi records, as well as resting-state fMRI in neuroimaging.

1 Introduction data are encountered in the form of dynamic networks (Dubey and Müller 2021) and arise in the analysis of brain connectivity, traffic mobility, and many other areas.

Several recent studies focus on the analysis of a population of networks. For example, a geometric framework for inference concerning a population of networks was introduced and complemented by asymptotic theory for network averages in Ginestet et al. (2017) and Kolaczyk et al. (2020) . A similar framework for studying populations of networks, where the graph space is viewed as the quotient space of a Euclidean space with respect to a finite group action was studied in Calissano et al. (2020) .

A challenging and commonly encountered problem is to model the relationship between network objects and one or more explanatory variables, which corresponds to a regression problem. Instead of directly modeling the network objects, matrix representations of networks, such as the adjacency matrices and graph Laplacians, provide useful characterizations of the space of networks. A general framework for the statistical analysis of populations of labeled, undirected, weighted, and simple networks was developed by identifying networks with the corresponding graph Laplacians and embedding the space of graph Laplacians in a Euclidean feature-space, where statistical analysis including linear regression is carried out using extrinsic methods (Severn et al. 2019) . Non-parametric regression for networks based on the same framework was proposed subsequently by adopting Nadaraya-Watson kernel estimators (Severn et al. 2021 ). However, embedding methods suffer from losing much of the relational information due to the non-Euclidean structure of the space of networks and assigning nonzero probability to points in the embedding space that do not represent networks. Another practical issue in this context is the need to estimate a covariance matrix which has a very large number of parameters, of the order of m 4 , where m is the number of nodes in the network.

Based on the adjacency matrix of a network, Calissano et al. (2021) proposed a networkvalued regression model by implementing linear regression in the Euclidean space and then projecting back to the "graph space" through a quotient map (Calissano et al. 2020 ). This model is widely applicable for various kinds of unlabeled networks, but for the regression case there is no supporting theory and this approach may not be suitable for labeled networks, which are prevalent in applications. Examples include brain functional connectivity networks (Fornito et al. 2016) , which are typically labeled as the goal is to study the role of different regions of interest (ROIs) in brain activity.

To circumvent the problems of embedding methods and provide theoretical support for network-valued regression, we introduce a unifying intrinsic framework -Dynamic Network Regression (DNR) -by adopting conditional Fréchet means. Specifically, let G = (V, E) be a network with a set of nodes V and a set of edge weights E. Under the assumption that G is labeled and simple (i.e., there are no self-loops or multi edges), one can uniquely associate each network G with its graph Laplacian L. Consider random pairs (X, L) ∼ F , where X takes values in R p , L is a graph Laplacian and F indicates a suitable probability law. We investigate the dependence of L on covariates of interest X by adopting the general framework of Fréchet regression (Petersen and Müller 2019) . The pointwise and uniform rates of convergence for corresponding M-estimators are obtained through a precise characterization of the space of graph Laplacians combined with empirical process methods.

We demonstrate the potential utility and flexibility of the proposed dynamic Network Regression (DNR) with New York taxi data and also with fMRI data obtained from the ADNI neuroimaging study. The New York taxi data give rise to transport networks that are constructed from yellow taxi records for the days from April 12, 2020 to September 30, 2020, along with covariates including COVID-19 information. The neuroimaging data reflect brain functional networks constructed from resting-state functional magnetic resonance imaging (rs-fMRI), where age of the subject is used as covariate.

The organization of this paper is as follows. In Section 2, we provide a precise characterization of the space of graph Laplacians, and discuss potential metrics for this space. The proposed Dynamic Network Regression (DNR) for network responses and vector covariates is introduced in Section 3. The pointwise and uniform rates of convergence for the estimators are established in Section 4. Computational details and simulation results for a sample of networks are presented in Section 5. The proposed framework is illustrated in Section 6 using the New York yellow taxi records and rs-fMRI data from the ADNI study. Detailed theoretical proofs and auxiliary results are in the Supplementary Material.

Let G m = (V, E) be a network with a set of nodes V = {v 1 , v 2 , · · · , v m } and a set of edge weights E = {w ij : w ij ≥ 0, 1 ≤ i, j ≤ m}, where w ij = 0 indicates v i and v j are unconnected.

Some basic and mild restrictions on the networks G m we consider here are as follows.

(A0) G m is simple, i.e., no self-loops or multi-edges.

(A1) G m is weighted, undirected, and labeled.

(A2) The edge weights w ij are bounded above by W ≥ 0, i.e., 0 ≤ w ij ≤ W .

Assumption (A0) is required for the one-to-one correspondence between a network G m and its graph Laplacian, which is our central tool to represent networks. Assumption (A1) guarantees that the adjacency matrix A = {w ij } is symmetric, i.e., w ij = w ji for all i, j, which is necessary for our theoretical developments. Assumption (A2) puts a limit on the maximum strength of the connection between two nodes and prevents extremes. Any network satisfying assumptions (A0)-(A2) can be uniquely associated with its graph Laplacian L = (l ij ), defined as

for 1 ≤ i, j ≤ m, which motivates to characterize the space of networks through the corresponding space of graph Laplacians given by

where 1 m and 0 m are the m-vectors of ones and zeroes, respectively. For every L ∈ L m ,

the entries in each row sum to 0, L1 m = 0 m , (P2) the off-diagonal entries are nonpositive and bounded below, −W ≤ l ij ≤ 0.

Another well-known property of graph Laplacians is their positive semi-definiteness, x Lx ≥ 0 for all x ∈ R m , which immediately follows from properties (P0)-(P2), as any such L is diagonally dominant, i.e., l ii = j =i |l ij | (De Klerk 2006, Page 232).

A precise geometric characterization of the space of graph Laplacians can be found in Ginestet et al. (2017) . The characterizations in this paper were however limited to the case of graph Laplacians with fixed rank, which is not practicable when considering network-valued regression where the rank can change in dependence on predictor levels. This necessitates and motivates study of the space of graph Laplacians with no restrictions on their rank in the following.

Proposition 1. The space L m , defined in (1), is a bounded, closed, and convex subset in R m 2 of dimension m(m − 1)/2.

All proofs are given in the Supplementary Material. Proposition 1 ensures the existence and uniqueness of projections onto L m that we will utilize in the proposed regression approach.

One can adopt one of several metrics when viewing the space of graph Laplacians L m as a metric space. A common choice is the Frobenius metric, defined as

which is simply the Euclidean metric if the matrix is stretched to a long vector of length m 2 .

We will refer to d F as the Euclidean metric for simplicity. While d F is the simplest of the possible metrics on the space of graph Laplacians, it is not necessarily the most appropriate metric, depending on the specific application setting. A downside of the metric d F is that the determinant typically inflates along geodesics, which is undesirable and is referred to as the swelling phenomenon (Arsigny et al. 2007; Lin 2019) .

Several non-Euclidean metrics have been proposed as alternatives to d F . Denote the space of real symmetric positive semi-definite m × m matrices by S + m . Note that the space of graph Laplacians L m is a subset of S + m . Let U ΛU be the usual spectral decomposition of S ∈ S + m , with U ∈ O m an orthogonal matrix and Λ diagonal with non-negative entries. Defining matrix power maps

where the power α > 0 is a constant and noting that F α is bijective with inverse F 1/α , the Euclidean power metric (Dryden et al. 2009 ) between graph Laplacians is

For α = 1, d F,α reduces to the Euclidean metric d F . For larger α there is more emphasis on larger entries of graph Laplacians, while for small α large and small entries are treated more evenly and there is less sensitivity to outliers. In particular, d F,α with 0 < α < 1 is associated with a reduced swelling effect, while d F,α with α > 1 in contrast will amplify it and thus often will be unfavorable. For α = 1/2 the Euclidean square root metric d F,1/2 is a canonical choice that has been widely studied (Dryden et al. 2009 (Dryden et al. , 2010 Zhou et al. 2016; Severn et al. 2019; Tavakoli et al. 2019) . For example, Dryden et al. (2010) studied different values of α in the context of diffusion tensor imaging and ended up with the choice α = 1/2; also Tavakoli et al. (2019) illustrated the advantages of using d F,1/2 through a spatial modeling approach for linguistic object data. In our applications, we likewise focus on the Euclidean square root metric due to its promising properties and compare its performance with the Euclidean metric d F ; see also Petersen and Müller (2016) regarding the choice of α.

Another non-Euclidean metric arising from the area of shape analysis, the Procrustes metric, has recently received some attention (Dryden and Mardia 2016) . Formally, the Procrustes metric is defined as

where R is an orthogonal matrix, and involves matching L 2 optimally to L 1 by rotation and reflection. Since graph Laplacians are all centered, applying the Procrustes metric only preserves the scale information (Dryden and Mardia 2016, Chapter 5) . However, removal of rotation and reflection information for graph Laplacians is not desirable, as nodes may be relabelled or combined in the Procrustes matching step, which is a problem when the focus is on labeled networks as in this paper. We therefore do not consider this metric further.

Consider a random object Y ∼ F Y taking values in a metric space (Ω, d). Under appropriate conditions, the Fréchet mean and Fréchet variance of random objects in metric spaces (Fréchet 1948) , as generalizations of usual notions of mean and variance, are defined as

where the existence and uniqueness of the minimizer depends on structural properties of the underlying metric space, and is guaranteed for Hadamard spaces. A general approach for regression of metric space-valued responses on Euclidean predictors, referred to as Fréchet Suppose (X, L) ∼ F is a random pair, where X and L take values in R p and L m ≡ (L m , d),

respectively, and F is the joint distribution of (X, L) on R p × L m . We denote the marginal distributions of X and L as F X and F L , respectively, and assume that µ = E(X) and Σ = Var(X) exist, with Σ positive definite. The conditional distributions F X|L and F L|X are also assumed to exist. The conditional Fréchet mean, which corresponds to the regression function

where M (·, x) is referred to as the conditional Fréchet function. Further suppose that (X k , L k ) ∼ F, k = 1, 2, · · · , n are independent and definē

The global DNR given X = x is defined as

where s G (x) = 1 + (X − µ) Σ −1 (x − µ). The corresponding sample version iŝ

where s kG (x) = 1 + (X k −X) Σ −1 (x −X).

For local DNR, we present details only for the case of a scalar predictor X ∈ R, where the extension to X ∈ R p with p > 1 is straightforward while estimates will be subject to the curse of dimensionality. Consider a smoothing kernel K(·) corresponding to a probability density and K h (·) = h −1 K(·/h) with h a bandwidth. The local DNR given X = x can be written as

Here s kL (x, h) = 1

The dependency on n is through the bandwidth sequence h = h n .

We note that for the case where X ∈ R p with p > 1, the weight function takes a slightly different form, 

In this section, we establish consistency of both global and local DNR estimators as per (9) and

(11) using the metrics of interest introduced in Section 2.2. Both pointwise and uniform rates of convergence are obtained under the framework of M-estimation. These rates are validated by simulations in Section 5, where we find that the finite sample results are entirely consistent with theory (see Figure 3 ). We first consider the case where L m is endowed with the Euclidean metric d F . As a subset of R m 2 , L m inherits a natural metric from R m 2 , which coincides with d F . The convexity and closedness of L m imply that the minimizersm G (x) andm L,n (x) defined in (9) and (11) exist and are unique for any x. The following result formalizes the consistency of the proposed global DNR estimator and provides rates of convergence.

Theorem 1. Let the space of graph Laplacians L m be endowed with the Euclidean metric d F .

Then for a fixed x ∈ R p , it holds for m G (x) andm G (x) as per (8) and (9) that

Furthermore, for a given B > 0,

for any ε > 0.

As in the Euclidean case where one needs to consider bias in exchange for greater flexibility for non-parametric approaches, the rate of convergence for local DNR estimator depends both on bias d F (m(x), m L,h (x)) and stochastic deviation d F (m L,h (x),m L,n (x)); see the following result.

Theorem 2. Let the space of graph Laplacians L m be endowed with the Euclidean metric d F .

Suppose (LP0), (LP1) hold, then for a fixed x ∈ R, it holds for m(x), m L,h (x), andm L,n (x) as per (7), (10), and (11), respectively that

With h ∼ n −1/5 , it holds that

Furthermore, suppose (LU0), (LU1) hold, for a given closed interval T , if h → 0, nh 2 (− log h) −1 → ∞ as n → ∞, then for any ε > 0,

The proof of Theorem 1 and Theorem 2 relies on the fact that L m is convex, bounded, and of finite dimension. We represent global and local DNRs as projections P Lm onto L m ,

where

whereB G (x) = n −1 n k=1 s kG (x)L k andB L,n (x) = n −1 n k=1 s kL (x, h)L k . We now turn our attention to the Euclidean power metric d F,α . Recall that the graph Laplacian L is centered and the off-diagonal entries are bounded by W as per (1). By the equivalence of the Frobenius norm and the l 2 -operator norm in R m 2 , it immediately follows that the largest eigenvalue of L is bounded, say by C, a nonnegative constant depending on m and W . Define the embedding space M m to be a subset of S + m ,

where λ 1 (S) is the largest eigenvalue of S. Note that the image of L m under the matrix power map F α , i.e., F α (L m ), is a subset of M m as a consequence of the bound C on the largest eigenvalue of each graph Laplacian.

After applying the matrix power map F α , the image of L m is embedded in M m , where DNR is carried out using the Euclidean metric d F . When transforming back from the embedding space M m to L m , we first apply the inverse matrix power map F 1/α and then a projection P Lm onto L m . The general idea involving embedding, mapping and projections is shown schematically in Figure 1 . (1) and (24), respectively. DNR is carried out using the Euclidean metric d F in the embedding space M m . F α is the matrix power map defined in (3)

The global DNR in the embedding space M m using the Euclidean metric d F can be likewise simplified to be a projection P Mm onto M m ,

Then the global DNR in the space of graph Laplacians L m using the Euclidean power metric d F,α is obtained by applying the inverse matrix power map F 1/α and a projection P Lm successively on m α G (x). That is (with a slight abuse of notation)

The corresponding sample versions arê

Similarly for the local DNR,

. To obtain rates of convergence for power metrics d F,α , we need the following proposition concerning the Hölder continuity of the matrix power map F α . Suppose that U is a set in R n 1 , E is a non-empty subset of U , and 0 < β ≤ 1. A function g : U → R n 2 is uniformly Hölder continuous with exponent β and Hölder coefficient H in the set E, shortly (β, H)-Hölder

For β = 1 the function g is said to be Lipschitz continuous in E with Lipschitz constant H,

shortly H-Lipschitz continuous.

(2) αC α−1 -Lipschitz continuous in E m for α ≥ 1.

Proposition 2 leads to rate of convergence results for the global and local DNR estimators defined in (28) and (31), where the population targets for the global and local DNR under the Euclidean power metric d F,α are defined in (26) and (29).

Theorem 3. If the space of graph Laplacians L m is endowed with the Euclidean power metric d F,α , for any fixed x ∈ R p , it holds for m G (x) andm G (x) as per (26) and (28) that

Furthermore, for a given B > 0, sup

for any ε > 0.

Theorem 4. Suppose the space of graph Laplacians L m is endowed with the Euclidean power

as per (29), (30), and (31), respectively, that

With h ∼ n −1/5 , it holds that

With

For the Euclidean power metric d F,α , rates of convergence for both the global and local DNR depend on the choice of power α. Specifically, rates of convergence are the same as those for the Euclidean metric d F if 0 < α < 1, providing theoretical justification for the use of the Euclidean power metric with 0 < α < 1.

We note that the convexity of the target space is crucial in the proof of existence and uniqueness for the minimizers in (7)-(11). Indeed, as stated in Deutsch (2012, Chapter 12), every Chebyshev subset of a finite-dimensional Hilbert space is convex. Let U be a nonempty subset of the Hilbert space X, then U is called a Chebyshev subset if each x ∈ X has exactly one best approximation in U . It can be shown that F α (L m ) as a subset of R m 2 is not convex, implying that it is not a Chebyshev subset. Hence uniqueness for the minimizers in (7)- (11) cannot be guaranteed. For this reason, we include embedding F α (L m ) in M m as defined in (24), where uniqueness for the minimizers in (7)-(11) can be ensured.

Implementation of the proposed method involves two projections P Lm and P Mm as mentioned in Section 4. Due to the convexity and closeness of L m and M m , P Lm and P Mm exist and are

where L = (l ij ) is a graph Laplacian. The objective function f (L) is convex quadratic since its Hessian 2I m 2 is strictly positive definite. The three constraints, corresponding to the three properties (P0)-(P2) of graph Laplacians, are all linear. It immediately follows that (41) is a convex quadratic optimization problem, which can be solved using quadratic programming. We use the osqp (Stellato et al. 2020 ) package in R (R Core Team 2021) to solve this optimization problem.

Note that M m is a bounded subset of the positive semi-definite cone S + m . To implement P Mm , we first take the projection onto S + m and then truncate the eigenvalues to ensure that the largest eigenvalue is less than or equal to C α . The projection P S + m onto S + m is straightforward and has been studied in Boyd et al. (2004, Chapter 8, Page 399) . The unique solution for

To assess the performance of the global and local DNR estimates in (9) and (11) through simulations, we need to devise a generative model. Denote the half vectorization excluding the diagonal of a symmetric and centered matrix by vech, with inverse operation being vech −1 . By the symmetry and centrality as per properties (P0) and (P1), every graph Laplacian L is fully determined by its upper (or lower) triangle part, which can be further vectorized into vech(L) -a vector of length d := m(m − 1)/2. We hence can construct the conditional distributions F L|X by assigning an independent beta distribution to each element of vech(L). Specifically, a random sample of size d, (β 1 , · · · , β d ), is generated using beta distributions whose parameters depend on the scalar predictor X and vary under different simulation scenarios. The random response L is then generated conditional on X through an inverse half vectorization vech −1 on (−β 1 , · · · , −β d ). Four different simulation scenarios involving different types of regression and different metrics are illustrated in Table 1 .

The space of graph Laplacians L m is not a vector space. Instead, it is a bounded, closed, and convex subset in R m 2 of dimension m(m − 1)/2 as shown in Proposition 1. To ensure that the random response L generated in our simulation resides in L m , one needs to make sure that the off-diagonal entries −β i , i = 1, · · · , d are nonpositive and bounded below as per (P2). The other two properties (P0) and (P1) can be guaranteed by the inverse half vectorization vech −1 .

To this end, in our simulation β i , i = 1, · · · , d are sampled from beta distributions, which are defined on the interval (0, 1) and whose parameters depend on the uniformly distributed predictor X.

The consistency of global DNR relies on the assumption that the true regression function m(x) is equal to m G (x) as defined in (7) and (8), respectively. This assumption is satisfied if each entry of the graph Laplacian L is conditionally linear in the predictor X. For the global DNR under the Euclidean square root metric d F,1/2 , an extra matrix square map F 2 is required to ensure that the global DNR in the metric space (M m , d F ) as per (25) with α = 1/2 is element-wise linear in X.

∼ Beta(sin(πX), 1 − sin(πX)) Table 1 : Four different simulation scenarios, where m is the true regression function and L represents the generated random response. The parameters for the beta distributions of the random variables β j depend on the predictors X as indicated for simulation scenarios I -IV.

We investigated sample sizes n = 50, 100, 200, 500, 1000, with Q = 1000 Monte Carlo runs for each simulation. In each iteration, random samples of pairs (X k , L k ), k = 1, · · · , n were generated by sampling X k ∼ U(0, 1), setting m = 10, and following the above procedure.

For the qth simulation of a particular sample size, withm q (x) denoting the fitted regression function, the quality of the estimation was measured quantitatively by the integrated squared error (ISE)

where m(x) is the true regression function. The average quality of the estimation for the Q = 1000 Monte Carlo runs was assessed by the mean integrated squared error (MISE)

The bandwidths for the local DNR in Scenario III and IV were chosen using a leave-one-out cross-validation criterion.

The ISE for each simulation run and different sample sizes under different simulation scenarios are summarized in the boxplots in Figure 2 . Both global DNR and local DNR performed in a similar manner for both the Euclidean metric d F and the Euclidean square root metric d F,1/2 . We observe the decreasing ISE for increasing sample sizes, demonstrating the convergence of DNR to the target under various regression settings. That the asymptotic rates of convergence in Section 4 are fairly accurate even for finite samples can be seen from the scatter plots of log(MISE) versus log(n) in Figure 3 , which also include the fitted least squares regression line, the slope of which indicates the empirical rate of convergence. Recall that the theoretical rates of convergence under the four simulation scenarios are O P (n − 1 2 ), O P (n − 1 2 ), O P (n − 2 5 ), and O P (n − 2 5 ), respectively, as per (12) Figure 3 . This remarkable agreement of theory and empirical behavior supports the relevance of the theory and also shows that these rates are empirically optimal.

6 Data Applications We focus on yellow taxi trip records in the Manhattan area, which has the highest taxi traffic, predominantly provided by yellow taxis, and grouped the 66 taxi zones (excluding islands) as Since the weekend indicator is a binary predictor, we applied global DNR, using both a a a a a a  a a a a a E[d 2 (L, m G (x))]/V ⊕ , an extension of the coefficient of determination R 2 for linear regression, can be similarly used to quantify the proportion of response variation "explained" by the covariates.

The corresponding sample version isR 2 ⊕ = 1 − d 2 (L k ,m G (X k ))/ d 2 (L k ,ω ⊕ ), whereω ⊕ = argmin ω∈Lm d 2 (L k , ω), and was found to beR 2 ⊕ = 0.433 for d F andR 2 ⊕ = 0.453 for d F,1/2 , which further lends support for the use of d F,1/2 in this specific application.

True and fitted networks using the Euclidean square root metric d F,1/2 for a few days are shown in Figure To better illustrate the effect of COVID-19 new cases and weekends on network structures, Figure 7 shows the same predicted networks but each heatmap has its own color range. This shows that higher case numbers lead to structural changes in traffic more on weekends than on weekdays, likely because weekend travel is more driven by leisure activities and therefore which includes the Financial District and the World Trade Center. This likely reflects that more people work from home with increasing case numbers and demonstrates the flexibility of the fits obtained from DNR.

The increasing availability of neuroimaging data, such as functional magnetic resonance imaging (fMRI) data, has accelerated the investigation of age-related changes in human brain network organization. Resting-state fMRI (rs-fMRI), as an important modality of fMRI data acquisition, has been widely used to study normal aging, which is known to be associated with cognitive decline, even in individuals without any process of retrogressive disorder (Ferreira and Busatto 2013; Sala-Llonch et al. 2014 , 2015 . FMRI measures brain activity by detecting changes in blood-oxygen-level-dependent (BOLD) signals in the brain across time. During recordings of rs-fMRI subjects relax during the sequential acquisition of fMRI scans. Spontaneous fluctuations in brain activity during rest is reflected by low-frequency oscillations of the BOLD signal, recorded as voxel-specific time series of activation strength. Network-based analyses of brain functional connectivity at the subject level typically rely on a specific spatial parcellation of the brain into a set of regions of interest (ROIs) (Bullmore and Sporns 2009) . Temporal coherence between pairwise ROIs is usually measured by so-called temporal Pearson correlation coefficients (PCC) of the fMRI time series, forming a m × m correlation matrix when considering m distinct ROIs, which can be viewed as a basic measurement of functional connectivity. Hard or soft thresholding (Schwarz and McGonigle 2011) is customarily applied to produce a binary or weighted functional connectivity network.

Based on the functional connectivity network, the most commonly used network measures of functional connectivity, reflecting centrality of individual brain regions, integration, and segregation, etc., can be derived (Rubinov and Sporns 2010) . Among these, clustering coefficient, characteristic path length and small-worldness are key global topological measures that are widely used in the literature (Onoda and Yamaguchi 2013; Sala-Llonch et al. 2014 ).

The clustering coefficient is a measure of the cliquishness of connections between nodes from a topological point of view, indicating the extent of local interconnectivity in a network. The characteristic path length quantifies the ability for information to be propagated in parallel,

where lower values indicate higher routing efficiency. Small-worldness measures the balance between functional segregation (the presence of functionally specialized modules) and integration (a robust number of intermodular links), indicating the extent of global and local efficiency (Watts and Strogatz 1998) .

Local DNR can be employed to investigate healthy aging by modeling the relationship between brain functional connectivity networks and age. One can infer age-related changes in brain network organization based on the dynamics of predicted networks and study the effects of healthy aging in brain network topology using, e.g., clustering coefficient, characteristic path length and small-worldness. Data used in our study were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu), where n = 404 cognitively normal elderly subjects at ages ranging from 55.61 to 95.39 years old participated in the study;

one rs-fMRI scan is randomly selected if multiple scans are available for a subject.

We used the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002) to parcellate the whole brain into 90 ROIs, with 45 ROIs in each hemisphere. Details about the ROIs can be found in Section S.4 of the Supplementary Material (Table 3) . Preprocessing was carried out in MATLAB using the Statistical Parametric Mapping (SPM12, www.fil.

ion.ucl.ac.uk/spm) and Resting-State fMRI Data Analysis Toolkit V1.8 (REST1.8, http:

//restfmri.net/forum/?q=rest). Briefly, this included the removal of any artifacts from head movement, correction for differences in image acquisition time between slices, normalization to the standard space, spatial smoothing, and temporal filtering (bandpass filtering of 0.01-0.1 Hz). The mean time course of the voxels within each ROI was then extracted for network construction. A PCC matrix was calculated for all time course pairs for each subject. These matrices were then converted into simple, undirected, weighted networks by setting diagonal entries to 0 and thresholding the absolute values of the remaining correlations. Since most network theoretic measures are sensitive to variations in the number of edges in a network, we used density-based thresholding (Fornito et al. 2016) , where the threshold is allowed to vary from subject to subject to achieve a desired, fixed connection density. Specifically, in our analyses the 15% strongest connections were kept to achieve a density of 0.15 for each subject.

We implemented local DNR with the graph Laplacians corresponding to the networks con-structed from PCC matrices as responses, with age as scalar-valued covariate. The bandwidth for the age was chosen to minimize the prediction error using a leave-one-out cross-validation criterion, resulting in a bandwidth of h = 0.20. Prediction was performed at four different ages: 65, 70, 75, and 80 (approximately the 20%, 40%, 60%, and 80% quantiles of the age distribution of the 404 subjects). The predicted networks are demonstrated in Figure 8 , where the nodes were placed using the Fruchterman-Reingold layout algorithm (Fruchterman and Reingold 1991) to achieve optimal visualization. Spectral clustering (Newman 2006 ) was applied to detect the community structure in each network where different communities are distinguished by different colors. The number of communities for age equals 65, 70, 75, and 80 are 10, 12, 12,

and 16, respectively. The communities with no less than 10 nodes are highlighted using colored polygons. These communities are found to be associated with different anatomical regions of the brain (see Table 3 in Section S.4 of the Supplementary Material), where a community is identified as the anatomical region the majority of nodes belong to for situations where it intersects with multiple anatomical regions. As can be seen in Figure 8 , the communities associated with the central region, the parietal lobe, and the limbic lobe disintegrate into several small communities with increasing age. This finding suggests age-related increases of local interconnectivity or cliquishness. High cliquishness is known to be associated with reduced capability to rapidly combine specialized information from distributed brain regions, which may contribute to cognitive decline for healthy elderly adults (Sala-Llonch et al. 2015) .

We also used global network measures (small-worldness, clustering coefficient, and characteristic path length) to measure age-related changes in brain functional integration and segregation. Specifically, the three measures were computed using the fitted network for each subject.

The results are shown in Figure 9 . We observe that small-worldness is negatively correlated with increasing age but that this decline was relatively slow, indicating lower global and local efficiency of information processing for healthy elderly adults. As a measure of cliquishness, clustering coefficient increases with increasing age, which coincides with the results in Figure 8 .

Characteristic path length, however, seems not to change much with age. Additionally, we find that brain functional networks are small-world networks as the small-worldness for each subject is higher than 1 (Achard and Bullmore 2007; Humphries and Gurney 2008) . : Topological representation using spectral community detection for predicted functional connectivity networks at different ages (years). The communities with no less than 10 ROIs are highlighted using colored polygons. These communities are found to be associated with different anatomical regions of the brain (see Table 3 in Section S.4 of the Supplementary Material). Age (years) Characteristic path length Figure 9 : Scatter plots of the correlations between global network measures calculated from fitted networks and age for healthy elderly adults. From left to right: small-worldness, clustering coefficient, and characteristic path length.

To obtain rates of convergence for the global and local DNR estimators, we require the following assumptions that parallel those in Petersen and Müller (2019) and Chen and Müller (2020) .

For ease of presentation, we replace the graph Laplacian L in global and local DNR introduced in Section 3 by a general random object Y taking values in an arbitrary metric space (Ω, d) and

follow the same notations there. Consequently, the following assumptions apply to any random objects taking values in a metric space. We will verify that the two metric spaces, (L m , d F )

and (M m , d F ), satisfy these assumptions, which lays the basis for the derivation of rates of convergence for the corresponding estimators.

The following assumptions are required to obtain consistency and rates of convergence of m G (x). For a fixed x ∈ R p :

(GP0) The objects m G (x) andm G (x) exist and are unique, the latter almost surely, and, for

(GP1) Let B δ (m G (x)) ⊂ Ω be the ball of radius δ centered at m G (x) and N (ε, B δ (m G (x)), d) be its covering number using balls of size ε. Then

(GP2) There exists η 0 > 0, C 0 > 0 and γ 0 > 1, possibly depending on x, such that

Uniform convergence results require stronger versions of the above assumptions. Let · E be the Euclidean norm on R p and B > 0 a given constant.

(GU0) Almost surely, for all x E < B, the objects m G (x) andm G (x) exist and are unique.

Additionally, for any ε > 0,

and there exists ζ = ζ(ε) > 0 such that

We require the following assumptions to obtain pointwise rates of convergence form L,n (x).

For simplicity, we assume that the marginal density f X (·) of X has unbounded support, and consider x ∈ R with f X (x) > 0.

(LP0) The kernel K(·) is a probability density function, symmetric around zero. Furthermore, defining K kj = R K k (u)u j du, |K 14 | and |K 26 | are both finite.

(LP1) The marginal density f X (·) of X and the conditional densities f X|Y (·, y) of X given Y = y, exist and are twice continuously differentiable, the latter for all y ∈ Ω, and

is continuous as a function of x.

(LP2) The minimizers m(x), m L,h (x) andm L,n (x) exist and are unique, the last almost surely.

Additionally, for any ε > 0,

(LP3) Let B δ (m(x)) ⊂ Ω be the ball of radius δ centered at m(x) and N (ε, B δ (m(x)), d) be its covering number using balls of size ε. Then

(LP4) There exists η 1 , η 2 > 0, C 1 , C 2 > 0 and γ 1 , γ 2 > 1 such that

Obtaining uniform rates of convergence for local DNR is more involved and requires stronger assumptions. Suppose T is a closed interval in R. Denote the interior of T by T o .

(LU0) The kernel K(·) is a probability density function, symmetric around zero, and uniformly continuous on R. Furthermore, defining K kj = R K k (u)u j du, |K 14 | and |K 26 | are both finite. The derivative K exists and is bounded on the support of K, i.e.,

The marginal density f X (·) of X and the conditional densities f X|Y (·, y) of X given Y = y both exist and are continuous on T and twice continuously differentiable on T o , the latter for all y ∈ Ω. The marginal density f X (·) is bounded away from zero on

The second-order partial derivatives (∂ 2 f X|Y /∂x 2 )(·, y) are uniformly bounded,

is continuous as a function of x; for any x ∈ T , M (·, x) is equicontinuous, i.e.,

(LU2) For all x ∈ T , the minimizers m(x), m L,h (x) andm L,n (x) exist and are unique, the last almost surely. Additionally, for any ε > 0,

and there exists ζ = ζ(ε) > 0 such that

(LU4) There exists τ 1 , τ 2 > 0, D 1 , D 2 > 0 and ρ 1 , ρ 2 > 1 such that

Proof of Proposition 1

Proof. Properties (P0) for all ω ∈ L m . Consequently, we may take η i and τ i arbitrary, C i = D i = 1 and γ i = ρ i = 2 for i = 0, 1, 2 in (GP2), (GU2) , and (LP4), (LU4).

Next, we show that (GU1) holds, which then implies (GP1). Since L m is a subset of R m 2 , for any ω ∈ L m we have N (δε, B δ (ω), d F ) = N (ε, B 1 (ω), d F ) ≤ (1 + 2/ε) m 2 , Thus, the integral in (GU1) is bounded by √ ye −y dε < ∞.

using the substitution y = log(3/ε). Since this bound does not depend on δ, (GU1) holds and thus (GP1) as well. Likewise we can show that (LU3) and (LP3) also hold.

Theorem 2 in Petersen and Müller (2019) yields rates of convergence for the global DNR estimator. For the local DNR estimator, rates of convergence can be obtained using Corollary 1 in Petersen and Müller (2019) and Theorem 1 in Chen and Müller (2020) .

Proof. Recall that the matrix power map F α is defined as

where U ΛU is the spectral decomposition of S. Specifically, denote the eigenvalues of S by λ 1 ≥ λ 2 ≥ · · · ≥ λ m ≥ 0. Then F α (S) = U diag(λ α 1 , λ α 2 , · · · , λ α m )U . Note that the power function f (x) = x α : [0, ∞) → [0, ∞) is (α, 1)-Hölder continuous in [0, ∞) for 0 < α < 1, and is αC α−1 -Lipschitz continuous in [0, C] for α ≥ 1. Results (1) and (2) 

Efficiency and cost of economical brain functional networks

Barycenters in the Wasserstein space

Geometric means in a novel vector space structure on symmetric positive-definite matrices

Large sample theory of intrinsic and extrinsic sample means on manifolds -I

Large sample theory of intrinsic and extrinsic sample means on manifolds -II

Convex Optimization

Complex brain networks: graph theoretical analysis of structural and functional systems

Populations of unlabeled networks: Graph space geometry and geodesic principal components

Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graphspace

Uniform convergence of local Fréchet regression, with applications to locating extrema and time warping for metric-space valued trajectories

Aspects of Semidefinite Programming: Interior Point Algorithms and Selected Applications

Best Approximation in Inner Product Spaces

Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging

Statistical Shape Analysis: with Applications in R

Power Euclidean metrics for covariance matrices with application to diffusion tensor imaging

Fréchet analysis of variance for random objects

2020) Fréchet change-point detection

Modeling time-varying random objects and dynamic networks

Resting-state functional connectivity in normal brain aging

Fundamentals of Brain Network Analysis

Les éléments aléatoires de nature quelconque dans un espace distancié

Graph drawing by force-directed placement. Software: Practice and Experience

Hypothesis testing for network data in functional neuroimaging

On the meaning of mean shape: manifold stability, locus and the two sample test

Network 'small-world-ness': a quantitative method for determining canonical network equivalence

Social Network Analysis

Averages of unlabeled networks: Geometric characterization and asymptotic behavior

Riemannian geometry of symmetric positive definite matrices via Cholesky decomposition

Distance covariance in metric spaces

Overview of object oriented data analysis

Peter Hall, functional data analysis and random objects

Finding community structure in networks using the eigenvectors of matrices

Small-worldness and modularity of the resting-state functional brain network decrease with aging

Fréchet integration and adaptive metric selection for interpretable covariances of multivariate functional data

Fréchet regression for random objects with Euclidean predictors

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

Complex network measures of brain connectivity: uses and interpretations

Reorganization of brain networks in aging: a review of functional connectivity studies

Changes in whole-brain functional networks and memory performance in aging

Regression in nonstandard spaces with Fréchet and geodesic approaches

Negative edges and soft thresholding in complex network analysis of resting state functional connectivity data

Manifold valued data analysis of samples of networks

OSQP: an operator splitting solver for quadratic programs

Probability measures on metric spaces of nonpositive curvature

A spatial modeling approach for linguistic object data: Analyzing dialect sound variations across Great Britain

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

Public transport networks: empirical analysis and modeling

Object oriented data analysis: Sets of trees

Network modeling in biology: statistical methods for gene and brain networks

Collective dynamics of 'small-world' networks

On the Hölder continuity of matrix functions for normal matrices

Regularisation, interpolation and visualisation of diffusion tensor images using non-Euclidean statistics

similarly show that the metric space (M m , d F ) satisfies assumptions (GP0)-(GP2)

According to Theorem 2 in Petersen and Müller (2019), for the metric Region Zones 101 12 Battery Park, 13 Battery Park City, 87 Financial District North, 88 Financial District South

Little Italy/NoLiTa, 158 Meatpacking/West Village West, 211 SoHo, 249 West Village 103 4 Alphabet City

90 Flatiron, 246 West Chelsea/Hudson Yards 105 100 Garment District, 161 Midtown Center, 163 Midtown North, 164 Midtown South, 186 Penn Station/Madison Sq West, 230 Times Sq/Theatre District

Union Sq 106

152 Manhattanville, 166 Morningside Heights 110 41 Central

244 Washington Heights South 164 43 Central Park Table 2: Details about 13 regions in Manhattan

Data used in this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The investigators within the ADNI did not participate in analysis or writing of this study. A complete list of ADNI investigators can be found online. This research was supported in part through National Science Foundation grant DMS-2014626.

passenger counts, collected by New York City Taxi and Limousine Commission (NYC TLC), are publicly available at https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. The taxi zone map for Manhattan (Figure 10 (a)) available at https://www1.nyc. the boundaries zones for taxi pick-ups and drop-offs as delimited by the NYC TLC. We excluded the islands (103 Liberty Island, 104 Ellis Island, 105 Governor's Island) from our study. The remaining 66 zones in Manhattan can be grouped into 13 regions as delimited in Figure 10 (b) (source: https://communityprofiles.planning.nyc.gov). For the composition of these 13 regions see Table 2 . 

These regions are listed in Table 3 .