key: cord-0678871-0fwpcsmz
authors: Wycoff, Nathan; Binois, Mickael; Wild, Stefan M.
title: Sequential Learning of Active Subspaces
date: 2019-07-26
journal: nan
DOI: nan
sha: 0cf8d7a13ec213a7b2537f10994d17813bdb64f5
doc_id: 678871
cord_uid: 0fwpcsmz

In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is employed, on which finite differencing is performed. When the surrogate model is a Gaussian process, we show that the ASM estimator is available in closed form, rendering the finite-difference approximation unnecessary. We use our closed-form solution to develop acquisition functions focused on sequential learning tailored to sensitivity analysis on top of ASMs. We also show that the traditional ASM estimator may be viewed as a method of moments estimator for a certain class of Gaussian processes. We demonstrate how uncertainty on Gaussian process hyperparameters may be propagated to uncertainty on the sensitivity analysis, allowing model-based confidence intervals on the active subspace. Our methodological developments are illustrated on several examples.

Simulation-based computer experiments are pervasive in diverse fields ranging from engineering to social science. Some of these simulations are time-and resource-consuming to such an extent that carefully selecting the experiments to be run is a keystone concern. A standard technique in this context is to replace the expensive simulator by an inexpensive surrogate (e.g., a quadratic response surface, a radial basis function, or a Gaussian process (GP)). Such approaches have been shown to be efficient, with the downside that performance drops sharply when the number of variables determining the experiment becomes more than a few dozen.

Many efforts deal with this "curse of dimensionality" (as coined by (Bellman, 2003) ) from different points of view. Sensitivity analysis is concerned with determining the relative importance of each variable with respect to a quantity of interest, using methods such as screening or measures of influence; we refer to the work of Iooss and Lemaître (2015) for a review. Another option is to assume additional structure about the problem at hand, such as additivity. The idea is to decompose the multivariate function f into a sum of functions each of fewer variables, decreasing the amount of data needed for reliable inference. For GPs, purely additive examples include those of Durrande et al. (2012) and Duvenaud et al. (2011) , while combinations of variables are considered, for instance, by Durrande et al. (2013) and Sung et al. (2019) . Learning a latent space has also been attempted, for example with a generative topographic mapping (Viswanath et al., 2011) or internally in the GP model (Titsias and Lawrence, 2010) . Related to this latter method is detecting the presence of an active subspace (Constantine, 2015) , the framework in which this article operates (see Section 2.1 for a formal definition). The principle is to find the directions accounting for most of the variation in the quantity of interest and then to conduct a study with a surrogate on those main directions only. This approach has been shown to perform well on a variety of problems, is easy to interpret because it corresponds to learning important linear combinations of the inputs, and has a theoretical foundation.

It additionally allows for some visualization when the number of directions appears (or is selected) to be small. The methodology typically consists of a random sampling of the high-dimensional design space to identify the active subspace, followed by a parameter study therein. While some guidelines for the primary stage exist, the methodology remains unsatisfying for several reasons. First, it requires gradient observations, which are not always available for legacy simulators or noisy problems (Larson et al., 2019) . Relying on finite-difference approximations is an option-even for noisy simulators (Moré and Wild, 2012 )-but it has been shown to be less desirable (Palar and Shimoyama, 2018) . Directional derivatives may be used as well, from a low-rank matrix recovery perspective (Djolonga et al., 2013; Constantine et al., 2015) , but still demanding more costly observations if not directly available. As a result, replacing these derivative observations with those from a surrogate (e.g., a GP) is a standard approach (Fukumizu and Leng, 2014; Othmer et al., 2016; Palar and Shimoyama, 2018) . One must, however, still quantify the additional amount of uncertainty introduced.

Second, unless a surrogate is used, the learned subspace cannot benefit from experiments run during the second stage because of the assumption of independent and identically distributed (iid) gradient sampling (because, for instance, optimization will typically involve function evaluations clustered in specific locations).

In the GP literature, several authors have bypassed the two-stage approach by directly estimating the directions of influence, treating those as additional hyperparameters; see, for example, the work of Garnett et al. (2014) and Tripathy et al. (2016) . As a byproduct, this approach does not require gradient observations (although such observations can be accommodated) and does not require iid samples. Nevertheless, the proposed estimation procedures become much more complicated, requiring a variety of approximations. A major benefit of not requiring iid samples for active subspace estimation is to enable sequential learning procedures, that is, adding optimally informative points with respect to a quantity of interest.

In this article, we bridge the gap between two-stage and direct estimation procedures for GPs, alleviating some of the limitations of recent active subspace methods. Our contributions can be summarized as follows:

• We prove that the classical active subspace estimator is a method of moments esti-mator for a certain class of GP models, formally establishing a link where similarities have previously been noted.

• We provide a closed-form expression for the matrix defining the active subspace for a GP for popular covariance kernels, rendering the Monte Carlo (MC) finite differencing on top of a surrogate unnecessary;

• Our closed-form solution allows sequential design to be carried out efficiently, allowing us to gain better estimates of the active subspace with fewer black-box evaluations, a feature especially important when these evaluations are computationally expensive;

• We show how interval estimates may be calculated quickly via MC on the active subspace by propagating uncertainty from the GP hyperparameters. This paper is organized as follows. Section 2 presents the active subspace approach as well as background on GPs. Section 3 contains our methodological contributions, before illustrating them empirically in Section 4. We present our conclusions in Section 5.

We first review the active subspace and GP paradigms.

The concept of active subspaces (Constantine et al., 2014) is appealing for fields from uncertainty quantification (UQ) to inverse problems. Informally, active subspace methods (ASMs) involve determining along which directions a scalar function of many variables changes a lot on average, and along which directions it is almost constant on average. The ideal family functions with which to describe the ASMs is that of ridge functions, (not to be confused with ridge regression, an unrelated 2 regularization technique), which are defined as functions that vary uniquely in certain directions. To be precise, fix x ∈ X ⊆ R m and A ∈ R r×m . Then a ridge function f : R m → R is a function of the form f (x) = g(Ax)

(1) for some function g : R r → R. Typically, r < m, so g acts on a lower dimension than m. By discovering A, one exposes the low-dimensional structure in f . These functions underlie projection pursuit methods; see, for instance, the work of Friedman and Stuetzle (1981) .

Ridge functions are constant along A's kernel. ASMs provide a framework for operating under looser assumptions: instead of the function being constant along A's kernel, ASMs stipulate that the directional derivatives in directions belonging to this subspace are significantly smaller than those belonging to A's range and make this assumption in expectation rather than absolutely.

Rigorous definition of the active subspace requires us to consider the matrix C, defined as the expected outer product of the gradient:

Here, X ⊆ R m is the domain (of interest) of f , and µ is an arbitrary measure. Little attention has been paid to specifying µ beyond the heuristic that the Lebesgue measure is in some sense the natural starting point for bounded X , whereas the Gaussian is used for unbounded X .

In practice, C is estimated by using a simple Monte Carlo procedure: sample x i ∼ µ for i ∈ {1, . . . , M }, then calculate the gradient at each point x i to obtain the estimator

The properties of this estimator, which we call the classical or MC estimator, have been analyzed by Constantine (2015) and Holodnak et al. (2018) , along with guidelines for choosing M , the MC sample size.

The matrix C contains information about the gradient's direction on average through its eigendecomposition. In certain cases, one observes a jump in the eigenvectors: a "spectral gap." This is indicative of an active subspace, defined as the principal eigenspace corresponding to the eigenvalues prior to the jump.

Ideas similar to active subspaces have long been explored in the statistics community, often under the name of sufficient dimension reduction (SDR). Generally, these methods are applied in the context of observational data analysis, where an analyst desires, say, a visualization of many high-dimensional data, whereas methods bearing the name active subspaces tend to have a UQ application in mind. We briefly outline some of the most popular SDR methods and refer the reader to the work of Ma and Zhu (2013) for a more complete review.

Sliced inverse regression (Li, 1991) , among the earliest methods, involves breaking the response into bins, taking the mean of each feature within each bin, and then performing principal component analysis on the bins to give important directions.

Arguing that eigenvectors corresponding to large eigenvalues represent important directions, Li (1992) presents a method for subspace dimension reduction achieved by estimating the expected Hessian ∇ 2 f . Computation involves solving the generalized eigenproblem on matrices with size equal to the dimension of the input space.

Also typical have been methods based on estimating these or related quantities by using kernel smoothers. An important early article in this regard is that of Samarov (1993) , who considered estimation of several functions defined in terms of integrals, including the expected outer product of the gradient, defined exactly as C, as well as the expected Hessian. In the same article, sample estimators are constructed for such quantities by using kernel-regression-based estimators of the gradient at each sample location. Practical use of these techniques is complicated by the need to choose a kernel bandwidth; and although methods exist to this end, none may be considered "best," and the methods may lead to different results (Ghosh, 2018) . Fukumizu and Leng (2014) alleviate some of those shortcomings but still rely on a MC estimator of C. Among kernel methods, GPs offer additional benefits by taking a more probabilistic point of view.

Gaussian Processes are a common Bayesian nonparametric technique; see, for example, Rasmussen and Williams (2006) for an introduction. Given any set of pairs (x 1 , y 1 ), . . . , (x n , y n ) with x i belonging to a set X representing inputs and y i ∈ R representing outputs of some real-valued function defined on X ⊆ R m , the GP models the output vector y n = (y i ) 1≤i≤n

as coming from a Gaussian distribution with a mean vector and covariance matrix dependent on X n = (x i ) 1≤i≤n . Typically, the mean is set to be a constant or a linear function of X n , and the covariance matrix is determined via a kernel function, a positive definite function defined on X × X , such that the covariance satisfies Cov [y i , y j )] = k(x i , x j ). GP regression (also known as kriging) may be interpreted as a Bayesian linear regression on an infinite-dimensional space determined by the chosen kernel function (Scholkopf and Smola, 2001) .

For the purposes of this article, suppose we are given n observations of a black-box function (i.e. function values are available, but gradient values are not, see, e.g. Forrester et al. (2008a) ) f : X → R, possibly depending on an active subspace of dimension r. Based on this data A n = (x i , y i = f (x)) 1≤i≤n , and considering a prior (zero-mean) Gaussian process Y ∼ GP (0, k(·, ·)), classical multivariate normal properties result in the GP predictive equations; that is, Y |A n ∼ N (m n (·), k n (·, ·)) with m n (x) = k(x)K −1 n y n (4)

where k(x) = (k(x, x i )) 1≤i≤n and K n = (k(x i , x j )) 1≤i,j≤n .

Unfortunately, without modification, GPs are not generally suited for problems of more than a few tens of variables. A common remedy is to conduct some manner of dimension reduction and perform the kriging on this smaller space. Popular among these methods are linear embeddings, which involve projecting the input data onto a lower-dimensional subspace. This may be done randomly as by Wang et al. (2016) or may be treated as kernel hyperparameters. A rank-deficient matrix in GP kernels to reduce dimensionality is presented by Vivarelli and Williams (1999) . That the matrix is rank-deficient means that its range has smaller dimension than the underlying space, resulting in dimension reduction. Tripathy et al. (2016) explicitly model the GP as existing in the low-dimensional space defined by Ax for some A (though this is easily seen to be equivalent to putting A in the kernel) and propose a two-step approach by alternating between optimizing with respect to the matrix A and the GP kernel hyperparameters. Since A is constrained to be orthogonal for identifiability purposes, this involves a nontrivial optimization over Grassmann manifolds. Hokanson and Constantine (2018) describe a similar approach using a polynomial model on the active subspace for which the two-step algorithm simplifies.

Also taking the hyperparameter route are Garnett et al. (2014) , who define an acquisition function for sequential design that minimizes uncertainty on the subspace. This work bears similarity to ours but differs substantially in that it fits a GP on a low-dimensional subspace to be estimated, whereas we fit a GP on the entire space and use properties of GPs to estimate the important subspace.

GPs via linear embeddings (GPLEs) and ASMs have much in common. Although links between GPs and ASMs have been observed previously, here we rigorously establish the classical ASM estimatorĈ as a method of moments (MoM) estimator for the linear embedding matrix in GPLEs in Theorem 1.

Theorem 1. Assume we are given a mean-zero Gaussian process with the stationary kernel

} that is, an anisotropic Gaussian kernel with variance parameter σ 2 and Mahalanobis distance given by the matrix

Proof. Since the active subspace is defined on the gradients of the function of interest, it is useful to derive the GP implied by the above model on the gradient.

That the stochastic process implied by a GP on a function's partial derivatives is also a GP is well known (e.g., (Rasmussen and Williams, 2006, Chapter 9.4) ). In particular, the covariance implied on partial derivatives of the GP is simply the corresponding entries of the Hessian of the kernel function

It is therefore useful to derive the Hessian of our kernel function. Given that the derivative is

the Hessian can be expressed as

Now turn to the quantity of interest,

Using our zero-mean assumption for our GP, our gradient's mean is then determined to be zero as well. This deletes the second term, leaving

For any stationary kernel, the quantity Var [∇f ] at any point x may be determined by our GP assumptions on f as

We have thus established that σ −2Ĉ may be viewed as a MoM estimator for A A.

In our case, r is the dimension of the linear subspace, generally chosen to be much less than m, and is assumed to give the rank of A and hence of A A. Note that the Gaussian kernel is also known as the squared exponential or radial basis function.

WhileĈ is an unbiased estimator for σ 2 A A, we should not expect it to be consistent in the sense of infill asymptotics for a single function, similarly to how maximum marginal likelihood estimators for the length-scale parameters fail to be consistent because of the latter's non-microergodicity in the sense of (Stein, 1999, Chapter 6) .

As is often the case with MoM estimators, the range of this estimator will not always align with the possible values for the parameter it is estimating: the sample estimate will oftentimes be of rank much greater than r. In these cases, we can project the MoM estimator onto the set of rank-r matrices via some suitable norm-induced projection operator.

In the case of either the Frobenius norm or the spectral norm, the optimal rank-r matrix is that formed by the leading r eigenvectors of the matrix to be projected by the Eckart-Young-Mirsky theorem (Eckart and Young, 1936; Mirsky, 1960) , giving us exactly the active subspace solution. This estimator is unbiased by construction; and although its derivation was made in the context of GPs, the normality assumption was not used.

Estimating the matrix C is not straightforward if gradients for the function f are not available, as is often the case in practice. Along with finite-difference gradient estimation, several gradient-free methods have been proposed to deal with this issue given n observations of design points stored in a matrix X n and the function f evaluated at those points stored in a vector y n .

Perhaps the simplest of these ideas is proposed by Constantine (2015) : simply perform linear regression of y n upon X n and look at the subspace generated by the estimated regression coefficient vector. This method, applicable only when a one-dimensional subspace is desired, was found to be adequate experimentally on several problems. Also proposed by Constantine (2015) is the use of local linear models, together with methods for aggregating their coefficient vectors.

Another approach is to retain the MC estimator but to use surrogate modeling to estimate the gradient at that point. Palar and Shimoyama (2017) , Othmer et al. (2016) and Namura et al. (2017) use finite differencing on a GP or polynomial chaos expansion in order to estimate gradients at random points throughout the input space, using the MC estimate of C.

We now show that the GP assumptions make such a process unnecessary, since a closedform estimate is available.

We now derive a closed-form estimator and a novel methodology for its use in sequential procedures. Our analytic estimator eases the computational burden and gives room for other procedures to be built on it.

Recall our fundamental inference task of estimating C = E (∇f )(∇f ) given a set of observations (X n , y n ). Although µ will be the Lebesgue measure in our experiments (Section 4), we will neither make nor need the assumption that X n ∼ µ. Given our Gaussian assumptions on the stochastic process, the expectation is available analytically in terms of standard functions for popular kernels.

Let us now express the matrix C (n) for a GP, starting with a fixed number of observations n, before providing sequential expressions. Recall that K n is the kernel matrix given n observations: given a positive definite kernel function k, the components of K n are given

where, in order to also account for noisy observations, δ is the Kronecker delta and τ 2 is the error variance.

Differentiation being a linear operator, it is well known that ∇Y = ∂Y /∂x 1 , . . . , ∂Y /∂x m is also a Gaussian process. Assuming that k is twice differentiable, the joint distribution of (Y (X n ), ∂Y (x)/∂x 1 , . . . , ∂Y (x)/∂x m ) is

which leads to a closed-form expression for C, as given in Theorem 2.

Theorem 2. Let k be a twice differentiable kernel, W i,j := X κ i (X) κ j (X)dµ, and E i,j := X ∂ 2 k(X,X)

Proof. Based on Petersen et al. (2008, Equation (321) ), C (n) = E ∇Y (X)(∇Y (X)) |A n = M + mm with X a random vector (e.g., same distribution as used for sampling the gra-

We use integration and expectations

interchangeably.

The first term is m = E [µ n (X)] = E [κ(X)] K −1 n y n = X κ(x)dµ K −1 n y n . For the second one, M, we use the law of total covariance:

The result follows by writing C Given our focus here on sequential design, also of interest is an update formula for calculating C (n+1) (i.e., our estimate for C given A n+1 ) accounting for the fact that we have already calculated C (n) using K n and W 

The resulting update formula is given in Theorem 3.

Theorem 3. Denote g(x) = −σ 2 n (x) −1 K −1 n k n (x) and σ 2 n (x) = k n (x,x). Given a new design pointx but not the function value at this location (i.e., y n+1 ∼ N (m n (x), k n (x,x))),

i,j can be written as

with Z ∼ N (0, 1) and

.

Once the response y n+1 has been observed, the expression evaluates to

Proof. The derivation is detailed in Appendix B.

Note that in contrast to the classical MC estimatorĈ, by using a GP to predict every-where, one can follow an "off-policy" approach: design points X n need not be sampled from µ. Combined with the above update expression, this enables sequential design capabilities to learn active subspaces more efficiently.

At first glance, a disadvantage of kernel methods as compared with classical parametric models is the difficulty in interpreting model parameters. A popular heuristic for determining relative variable importance is that of automatic relevance determination (ARD) (Rasmussen and Williams, 2006, Chapter 5 .1), which asserts that input dimensions with large length scales are of little importance. This is due to the limiting behavior of many popular kernels that asymptotically ignore an input dimension as its length scale tends to infinity. This is taken advantage of, for example, for the purpose of axis-aligned dimension reduction for optimization (Salem et al., 2019) . For two finite length scales l 1 > l 2 belonging to input variables x 1 , x 2 , however, there is no guarantee that x 2 is more important than

x 1 . As we illustrate here with a simple example, examining the loadings of variables in the active subspace eigenbasis may be more informative.

Consider the function

illustrated in Figure 1 . Along the x 1 dimension is a sinusoidal function with high frequency but low amplitude, and along the other is simply a quadratic function in x 2 , which dominates in terms of determining the function value. Given 1,000 observations uniformly distributed from the 2D unit interval, a GP with a Gaussian kernel gives length-scale estimates (determined via maximum likelihood estimation) of 0.069 for x 1 and 0.37 for x 2 , which, according to the ARD principle, mistakenly suggests x 1 is more important than x 2 .

Turning now to the active subspace estimated by the GP for the same sample, we find the eigendecomposition of our estimated C to bê The first eigenvector represents x 2 while the second represents x 1 , and the first eigenvalue is an order of magnitude greater than the first, suggesting, correctly, that the variable x 2 is of greater importance. The effect of projecting along the eigenvectors is further illustrated in Figure 1 .

x 1 x 2 f(x ) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 1 : The 2-D function (6) exhibits insignificant but high-frequency behavior in one dimension and significant but long-range behavior in the other. Filled black points are design points while red circles represent points evaluated on a regular grid for visualization. The active subspace method reveals a linear subspace along which most of the important changes occur (center), while the ARD principle would select the high-frequency dimension (right).

Of perhaps equal importance as giving an estimate for C is quantifying uncertainty in that estimate. Given an estimate, Constantine (2015) shows how the bootstrap method (Efron, 1981) can be used to create intervals by applying parametric bootstrapping to simple methods used to estimate active subspaces (e.g., OLS, LL). However, application of the bootstrap to more sophisticated active subspace estimation methods such as the one presented in this article can lead to computational challenges. Instead, uncertainty (6) are given interval estimates, expressing decreasing model uncertainty with samples of size 20 (left) and 40 (right). The Hessian expansion of the log-likelihood at its maximum gives a correlation of 0.16 between the two length-scale parameters. 95 and 99% confidence intervals are given in black and red, respectively, while the horizontal lines give the locations of the "true" eigenvalues estimated via classical MC techniques with 10,000 samples.

of GP parameters can be propagated to uncertainty of C via a simple MC procedure.

Myriad methods for quantifying GP hyperparameter uncertainty exist: a full posterior can be developed by using MCMC procedures, approximate posteriors can be determined (e.g., via variational methods), or the asymptotic normality of maximum likelihood estimators can be exploited. In light of its computational ease, the last method is explored here. This method involves computing the Hessian of the log likelihood at its maximum value, giving a covariance matrix for the parameters, and treating the maximizing parameter setting as the mean. This uncertainty on model parameters may be propagated to the active-subspacedefining singular values via MC: for each drawn parameter setting, estimate the singular values, then keep, for instance, the middle 95 percent of the draws. As an illustration, in Figure 2 we form intervals on the eigenvalues of the problem defined in (6). The downside of this approach is that, while it is speedy, it does not account for uncertainty as well as would a fully Bayesian approach leveraging MCMC.

Equipped with a closed-form GP estimate of C for a fixed design, we turn to the problem of choosing the next design to be evaluated given our observed responses so as to best learn about the active subspace. As do Labopin-Richard and Picheny (2018), we take the approach of selecting the design that maximizes variance of our unknown quantity C at the next iteration. However, how to define this variance in our case is not clear. One could define subspace variance through ideas such as Fréchet variance (Fréchet, 1948) and seek to minimize this, but the dimension of the active subspace is not known ahead of time, which would have to be determined by human observer or via a heuristic at each step of the sequential design process. Also possible is to consider measuring the variance of the spectral gap, which is problematic for the same reasons. We instead focus on the variance of the matrix C (n+1) , which we measure in three ways.

Since examination of C's spectrum is central to making decisions regarding the dimension of the active subspace, pinning down C's eigenvalues is of great interest. As such, we define the Trace acquisition function using the variance of the mean eigenvalue, which may be optimized by maximizing the variance of the trace of C (n+1) : Trace = Var tr(C (n+1) ) .

Of practical concern is that the Trace function may be calculated without computing the off-diagonal elements of the C matrix, which seems to some degree antithetical to the principle of ASMs seeking non-axis-aligned directions of importance.

We also consider two acquisition functions attempting to directly measure C's variance, which we term Var1,

and Var2,

Here, represents elementwise (Hadamard product) multiplication and A F denotes the Frobenius norm of A (the root of the sum of its squared elements).

Of practical interest is that, again, closed-form expressions can be obtained for these three acquisition functions.

Theorem 4. Given the coefficients β i,j (x) and γ i,j (x) from Theorem 3 stored in matrices B(x) and Γ(x), the acquisition functions and their gradients are available in the closed forms shown in Table 1 .

Proof. The derivation is detailed in Appendix C.

We note further that B(x) and Γ(x) have closed-form expressions for popular kernels, including the Gaussian, Matérn 3/2, and Matérn 5/2; see Appendix A. 

The details are given in the Appendix, where gradients are also discussed. A brief summary of the approach is given in Algorithm 1.

Algorithm 1 Pseudocode for the sequential learning approach Require: n 0 , criteria J(·) (e.g., Trace, Var1, Var2) 1: Construct and evaluate an initial design of experiments of size n 0 in X 2: Build initial GP model 3: while time/evaluation budget not exhausted do

Findx * ∈ arg max x∈X J(x)

Evaluate f (x * )

Update the GP model based on new data 7: end while

We now turn to application examples to demonstrate the benefits of sequential design for active subspace learning.

We illustrate the advantage of judicious design-point selection by comparing sequential GP subspace estimation with random GP estimation and Monte Carlo estimation, as well as approaches using a global linear model (OLS) or local linear models (LL). The task at hand being estimation of a subspace, we measure error in a model's prediction as the sine of the first principal angle between the true active subspace and the active subspace estimated at each potentially expensive function evaluation. The sine of the first principal angle between two subspaces U andÛ, denoted as ∠ 1 UÛ, may be computed as the spectral norm of a simple expression of two unitary matrices, the first, U, with range equal to U and the second,Û ⊥ , with kernel equal toÛ (Ji-guang, 1987, Lemma 3.1):

This "subspace distance" gives us some notion of the largest possible angle to be made between subspaces.

All of the tested methods begin with the same Latin hypercube sample and choose their next points either randomly or based on an acquisition function.

We begin by learning a quadratic function defined by a rank-1 matrix A; that is,

The gradient of this function is ∇f (x) = 2Ax = 2 a 1 x a 1 , so gradients live entirely in the one-dimensional subspace given by the range of A. During each trial, a vector a 1 is generated with iid standard normal entries. The results of applying each algorithm to 50 randomly generated rank-1 A matrices in dimensions m = 2, 5, and 8 are given in Figure 3 . The initial design is 5 times the dimension of the space, and the number of points selected sequentially is 10 times the dimension, giving a total budget of 15 times the input dimension in terms of function evaluations. To simulate numerical error or a stochastic experiment, we rerun the experiments this time adding Gaussian noise to the objective with standard deviation τ = 5e −5 . In the noiseless case, as soon as we see one gradient, we have the active subspace, so finite differencing (simple forward differencing with a step size of 10 −4 ) together with the Monte Carlo estimator gets the right answer after m + 1 evaluations. With slight numerical noise, however, the accuracy obtained by finite differencing degrades, and the advantage of intelligent selection of evaluation points is clear.

The difference between the selected three definitions of variance of C (n+1) can be observed in relation to the random infill, that is, randomly choosing the next point. Var1 and Var2 give similar results; Trace is in some cases barely better than random sampling. This indicates that to pinpoint the active subspace, it is insufficient to focus on the uncertainty associated with the mean (or sum) of the eigenvalues. Recall that each method begins with the exact same design, and thus that discrepancies in accuracy at the beginning of the simulation are due to how the methods differ in exploiting the same available data. Since the classical MC estimator cannot avail itself of random function evaluations, it is given that many additional function evaluations to conduct finite differencing prior to the trials that appear in the figures.

The wing weight function, introduced by Forrester et al. (2008b) , is a function of 10 inputs on a rectangular domain giving the weight of a light aircraft wing. Although it is a simple algebraic expression, it is of interest as a physically motivated test problem. Previously examined by the tutorial for the active subspaces Python package 1 , a prominent onedimensional active subspace was discovered. Here, we explore this function beginning with a design of 20 points and choose an additional 80 points via a sequential design policy or randomly. Since an analytic representation of the active subspace is unavailable, we measure the "true" active subspace by using finite differencing with the classical estimation technique with 10,000 function evaluations. Figure 4 illustrates the advantage of design- 

Function Evaluations Log Subspace Distance q q q q q q q q q q q q q q q q q q q q q q 

Function Evaluations Log Subspace Distance q q q q q q q q q q q q q q q q q q q q q 

Function Evaluations Log Subspace Distance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Function Evaluations Log Subspace Distance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Trace Var1 Var2 Rand OLS LL MC Figure 3 : The quadratic problem (8) in dimensions 2, 5, and 8. The x-axis represents computational effort in function evaluations while the y-axis represents the log of the mean (based on 50 trials) of the subspace error. The standard Monte Carlo estimator has zero error after a single gradient evaluation modulo minimal numerical noise in the deterministic case, so is not pictured in column 1, but would be at −∞ in theory and was found to be around 10 −10 in practice. point selection via our acquisition criteria for smaller function evaluation budgets. The variances that account for non-axis-aligned terms of C (n+1) (i.e., in contrast to the Trace acquisition function) perform best. 

Function Evaluations Log Subspace Distance q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 4 : Results on a simulation on the wing weight function. Left: Function evaluations are represented with the x-axis, while subspace error at each iteration is shown on the y-axis. The advantage of sequential design for smaller function evaluation budgets is clear. Right: The value maximizing the acquisition function is given for sequential strategies at each iteration.

In this article, we have shown that the surrogate-assisted estimation of active subspaces, a technique previously executed via Monte Carlo finite differencing, has a closed-form solution. This result was leveraged to create acquisition functions enabling evaluationefficient estimation of such subspaces via sequential design, potentially saving (expensive) black-box evaluations. Although in this article we initialized our search with a standard Latin hypercube sample, choosing an initial design related to our acquisition criteria may be preferable, an issue we leave for future work. Active subspace methods being a field of active research, we leave the estimation via Gaussian processes of recent refinements, such as that proposed by Lee (2019) , for perspective. Future work could focus on problems specifically related to optimization, perhaps by considering a different measure to define the expected outer product of the gradient or by explicitly incorporating only those subspaces that play an important role inf optimization, perhaps phrased in the framework of goaloriented sensitivity analysis (Fort et al., 2016) . Although not explored in this work, it is in principle straightforward to choose which points in the design space to explore in batches rather than one at a time. Such a problem, however, presents computational difficulty in optimization of the acquisition function. One may have to develop approximate methods in order for this to be carried out in practice. and, if i = j: Gradients of W i,j with respect to designs (here, x 1 or x 2 ) are available by simply replacing the appropriate term in the product by its derivative.

Closed-form expressions and gradients are available for certain k. Expressions for the Gaussian, Matérn 3/2 and 5/2 kernels, and gradients may be found in the Supplementary Materials; and C++ code implementing all expressions is available in the R package activegp.

We want to express C (n+1) as a function of C (n) , x n+1 and y n+1 . Ultimately we are also interested in its gradient. Reusing notations of Section 3.1, C 

i,j g(x)−(w a (x)+w b (x)) g(x)−σ 2 n (x) −1 w(x,x).

For the remaining one, partition inverse equations (Barnett, 1979) give

where g(x) = −σ 2 n (x) −1 K −1 n k n (x) and σ 2 n (x) = k n (x,x) as in (5). Denote G = g(x)g(x) σ 2 n (x). Decomposing and regrouping terms, we get the following.

g(x) σ 2 n (x) −1     y n y n+1   = y n (K −1 n + G) + y n+1 g(x) y n g(x) + y n+1 σ 2 n (x) −1 W (n+1) i,j   (K −1 n + G)y n + y n+1 g(x) g(x) y n + y n+1 σ 2 n (x) −1   =y n (K −1 n + G)W (n) i,j (K −1 n + G)y n + y n+1 y n (K −1 n + G)W (n)

i,j g(x) + y n+1 g(x) W (n) i,j (K −1 n + G)y n + y 2 n+1 g(x) W (n)

i,j g(x) + y n g(x)w b (x) (K −1 n + G)y n + y n g(x)w b (x) g(x)y n+1 + y n+1 σ 2 n (x) −1 w b (x) (K −1 n + G)y n + y 2 n+1 σ 2 n (x) −1 w b (x) g(x) + y n (K −1 n + G)w a (x)g(x) y n + y n+1 σ 2 n (x) −1 y n (K −1 n + G)w a (x) + y n+1 g(x) w a (x)g(x) y n + y 2 n+1 σ 2 n (x) −1 g(x) w a (x) + w(x,x)y n g(x)g(x) y n + w(x,x)y n+1 σ 2 n (x) −1 y n g(x) + y n+1 σ 2 n (x) −1 w(x,x)g(x) y n + y 2 n+1 σ 2 n (x) −2 w(x,x)

After replacing G, some g by their expressions and factorizing, we get the expression

− (y n g(x) + y n+1 σ 2 n (x) −1 ) y n K −1 n W (n)

i,j K −1 n k n (x) + k n (x) K −1 n W (n)

i,j K −1 n y n + (y n g(x) + y n+1 σ 2 n (x) −1 )(w a (x) + w b (x)) (K −1 n y n + g(x)y n+1 − g(x)k n (x) K −1 n y n ) + (y n g(x) + y n+1 σ 2 n (x) −1 ) 2 − σ 2 n (x) −1 w(x,x) + k n (x) K −1 n W (n)

i,j K −1 n k n (x) .

At step n, y n+1 ∼ N (m n (x), σ 2 n (x)). Taking the expectation with respect to y n+1 considerably simplify the expressions:

See below for expressions of β and γ. Indeed: 1) E y n g(x) + y n+1 σ 2 n (x) −1 = E (y n g(x) + m n (x)σ 2 n (x) −1 ) = 0 by definition of g(x) 2) E (y n g(x) + y n+1 σ 2 n (x) −1 ) 2 = σ 2 n (x) −2 m n (x) − 2σ 2 n (x) −2 m n (x) 2 + σ 2 n (x) −2 (m n (x) 2 + σ 2 n (x)) = σ 2 n (x) −1 3) E y n+1 (y n g(x) + y n+1 σ 2 n (x) −1 ) = −m n (x) 2 σ 2 n (x) −1 + σ 2 n (x) −1 (m n (x) 2 + σ 2 n (x)) = 1. Notice that if we use the plugin estimator of y n+1 , namely, m n (x), then

i,j K −1 n k n (x) + (w a (x) + w b (x)) g(x) + σ 2 n (x) −1 w(x,x) , which reduces to the integrated mean square prediction error of the corresponding derivatives.

For further simplifications, denote by Z = y n+1 −mn(x)

, in other words, a N (0, 1) random variable.

Then

i,j can be rewritten as = − (w a (x) + w b (x)) g(x) − σ 2 n (x) −1 w(x,x) + k n (x) K −1 n W (n)

i,j K −1 n k n (x) + Zσ n (x) −1 y n K −1 n W (n)

i,j K −1 n k n (x) + k n (x) K −1 n W (n)

i,j K −1 n y n − (w a (x) + w b (x)) K −1 n y n + Z 2 σ 2 n (x) −1 w(x,x) + k n (x) K −1 n W (n)

i,j K −1 n k n (x) − (w a (x) + w b (x)) K −1 n k n (x) .

Now denote the coefficient to the linear Z term as β i,j and that to the quadratic Z term as γ i,j , and further denote the matrices containing these as entries givenx as B(x) and

Γ(x), respectively. We further define 3D arrays ∂B(x) and ∂Γ(x) such that ∂B(x) d gives a matrix of derivatives of each element of B(x) with respect to the dth element ofx:

We are now equipped to derive closed-form expressions for our infill criteria, as well as gradients of those expressions with respect to design locations, the results of which are given in Table 1 .

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne"). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan.

Matrix Methods for Engineers and Scientists

Dynamic Programming

Replication or Exploration? Sequential Design for Stochastic Simulation Experiments

Active Subspaces

Active Subspace Methods in Theory and Practice: Applications to Kriging Surfaces

Computing Active Subspaces Efficiently with Gradient Sketching

Estimation of the Derivative-Based Global Sensitivity Measures Using a Gaussian Process Metamodel

High-Dimensional Gaussian Process Bandits

Additive Kernels for Gaussian Process Modeling

ANOVA Kernels and RKHS of Zero Mean Functions for Model-Based Sensitivity Analysis

Additive Gaussian Processes

The Approximation of One Matrix by Another of Lower Rank

Nonparametric Estimates of Standard Error: The Jackknife, the Bootstrap and Other Methods

Engineering Design via Surrogate Modelling: A Practical Guide

Engineering Design via Surrogate Modelling -A Practical Guide

New Sensitivity Analysis Subordinated to a Contrast

LesÉléments Aléatoires de Nature Quelconque dans un Espace Distancié

Projection Pursuit Regression

Gradient-based kernel dimension reduction for regression

Active Learning of Linear Embeddings for Gaussian Processes

Kernel Smoothing: Principles, Methods and Applications

Data-Driven Polynomial Ridge Approximation Using Variable Projection

A Probabilistic Subspace Bound with Application to Active Subspaces

A Review on Global Sensitivity Analysis Methods

Perturbation of Angles between Linear Subspaces

Sequential Design of Experiments for Estimating Percentiles of Black-Box Functions

Derivative-Free Optimization Methods

Modified Active Subspaces Using the Average of Gradients

Sliced Inverse Regression for Dimension Reduction

On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma

A Review on Dimension Reduction

Symmetric Gauge Functions and Unitarily Invariant Norms

Estimating Derivatives of Noisy Simulations

Kriging Surrogate Model with Coordinate Transformation Based on Likelihood and Gradient

On Active Subspaces in Car Aerodynamics

Exploiting Active Subspaces in Global Optimization: How Complex is Your Problem

Structural Dynamics, and Materials Conference

The Matrix Cookbook

Gaussian Processes for Machine Learning

Sequential dimension reduction for learning features of expensive black-box functions

Exploring Regression Structure Using Nonparametric Functional Estimation

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Interpolation of Spatial Data

Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments

Bayesian Gaussian Process Latent Variable Model

Gaussian Processes with Built-in Dimensionality Reduction: Applications to High-Dimensional Uncertainty Propagation

Dimension Reduction for Aerodynamic Design Optimization

Discovering Hidden Features with Gaussian Processes Regression

Bayesian Optimization in a Billion Dimensions via Random Embeddings

Mathematica, Version 12.0

Additional Kernel Expressions and Derivation: Detailed update derivation and kernel expressions for Matérn 3/2 and 5/2 kernels as well as gradients for all kernel expressions. (PDF) R-package for Sequential Active Subspace UQ: R-package activegp containing code implementing methods described in this article. (GNU Tar file).

Here we provide specific formulae to evaluate X