key: cord-0220925-fkmsog5p authors: Guo, Xu; Li, Runze; Liu, Jingyuan; Zeng, Mudong title: Statistical Inference for Linear Mediation Models with High-dimensional Mediators and Application to Studying Stock Reaction to COVID-19 Pandemic date: 2021-08-27 journal: nan DOI: nan sha: 97c26bf4f6177a2f0cdc8d32b21ce7be5b2e2b67 doc_id: 220925 cord_uid: fkmsog5p Mediation analysis draws increasing attention in many scientific areas such as genomics, epidemiology and finance. In this paper, we propose new statistical inference procedures for high dimensional mediation models, in which both the outcome model and the mediator model are linear with high dimensional mediators. Traditional procedures for mediation analysis cannot be used to make statistical inference for high dimensional linear mediation models due to high-dimensionality of the mediators. We propose an estimation procedure for the indirect effects of the models via a partial penalized least squares method, and further establish its theoretical properties. We further develop a partial penalized Wald test on the indirect effects, and prove that the proposed test has a $chi^2$ limiting null distribution. We also propose an $F$-type test for direct effects and show that the proposed test asymptotically follows a $chi^2$-distribution under null hypothesis and a noncentral $chi^2$-distribution under local alternatives. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed tests and compare their performance with existing ones. We further apply the newly proposed statistical inference procedures to study stock reaction to COVID-19 pandemic via an empirical analysis of studying the mediation effects of financial metrics that bridge company's sector and stock return. Since the seminal work of Baron & Kenny (1986) , mediation analysis has been used in various scientific research, such as economics, psychology, pedagogy, and behavioral science (Conti et al., 2016; Chernozhukov et al., 2021; Mackinnon, 2008; Hayes, 2013; Vanderweele, 2015) . It is designed to investigate the mechanisms whereby exposure variables affect an outcome through intermediate variables, which are termed as mediators. For instance, in the field of policy evaluation, while there certainly is no shortage of techniques assessing effects of policies or other treatments on an outcome (Imbens, 2004; Donald & Hsu, 2014; Athey et al., 2018; Ai et al., 2021) , mediation analyses move a step further to disentangle such effect into indirect effects through mediators, such as certain economic indices, and direct effects. Numerous statistical inference procedures for mediation models with low-dimensional mediators have been extensively studied (Preacher & Hayes, 2008; Vanderweele & Vansteelandt, 2014) . See Ten Have & Joffe (2010) and Preacher (2015) for brief reviews about inferences under low-dimensional mediation models. On account of modern data-collecting technology, mediation analysis extends its territory to quantitative finance, genomics, internet analysis, biomedical research, among other data-intensive fields. This brings in high-dimensional mediators and requires attention on high-dimensional mediation model (HDMM) , where the number of potential mediators is much larger than the sample size. Our work is motivated by such a high-dimensional mediation structure when studying the effects of company's belonging sector on stock return via influencing various financial metrics during the COVID-19 period. Direct effects of sectors, as well as financial statements, on stock performance have been extensively studied in literature. See for instance Fama & French (1993) ; Graham et al. (2002) ; Callen & Segal (2004) ; Edirisinghe & Zhang (2008) ; Dimitropoulos & Asteriou (2009) ; Fama & French (2015) ; Khan & Khokhar (2015) ; Enke & Thawornwong (2005) ; Huang et al. (2019) . Yet as to be evidently shown by the empirical analysis in section 3.2, the companies' belonging sectors also significantly affect stock returns indirectly through certain financial metrics in the statements. In our analysis, 550 financial indexes are involved, based on only 490 companies, resulting in high dimensional mediators. The high-dimensionality, on every account, poses both computational and statistical challenges for carrying out efficient mediation analysis. For instance, the traditional structural equation modeling fails due to the rank-deficiency of the observed covariance matrix. However, notwithstanding the high dimensional mediation structure, the number of truly active mediators is typically assumed small and less than the sample size. This is referred to as the sparsity assumption in the literature, although the sparsity pattern is unknown and thus to be recovered. See, for example, Fan et al. (2020) and references therein. Many existing methods in literature break through such obstacle by utilizing the dimension reduction techniques in regular linear models. For example, Huang & Pan (2016) and Chen et al. (2018) adopted principal components analysis to compress the dimensionality of mediators, and applied bootstrap for inference. These methods are intuitive and simple to implement, but lack theoretical justification about asymptotic distributions of the test statistics. As an extension of Huang & Pan (2016) , Zhao et al. (2020) further introduced sparse principal component analysis to mediation models. Zhang et al. (2016) used a two-stage technique with (a) first screening out "unimportant" mediators, and then (b) applying existing procedures for the post-screened outcome model. Zhou et al. (2020) introduced debiased penalized estimators for the direct and indirect effects, with theoretical guarantees of the related tests. However, their method involves estimating high dimensional matrices, leading to potentially unstable estimates and expensive computation. Furthermore, imposing penalization on all parameters reduces the efficiency of estimators, and hence tests. There are many developments on this topic in the recent literature (Chakrabortty et al., 2018; Derkach et al., 2019; Song et al., 2020) . In this paper, we propose new statistical inference procedures for HDMM. Statistical inference for high-dimensional data has been an active research topic in the literature (Belloni et al., 2014; Zhang & Zhang, 2014; van de Geer et al., 2014; Javanmard & Montanari, 2014; Shi et al., 2019; Fan, et al, 2020a,b) . However, there are much less work on statistical inference for HDMM. To our best knowledge, Zhou et al. (2020) is the only one on testing hypothesis on indirect effect with solid theoretical analysis. Our inference procedure on indirect effect is distinguished from Zhou et al. (2020) in that we observe the indirect effect in HDMM indeed is a low dimensional parameter and is the difference between the total effect and the direct effect in the HDMM. This motivates us to estimate the total effect via least squares method and the direct effect by partial penalized least squares method, and then estimate the indirect effect by the difference between the estimates of the total effect and the direct effect. We establish the asymptotical normality of the indirect effect estimate and further develop a Wald test for the indirect effect. We estimate the direct effect in the HDMM by partial penalized least squares method, and propose an F -type test for it. The statistical inference on the direct effect essentially is the same as statistical inference on low dimensional coefficients in high-dimensional linear models. This topic has been studied under the setting in which the covariate vector in the high-dimensional linear models is fixed design (Zhang & Zhang, 2014; van de Geer et al., 2014; Shi et al., 2019) . Due to the nature of HDMM, the design matrix in HDMM must be random rather than fixed since mediators are random. Thus, the statistical setting studied in this paper is different from the one in Shi et al. (2019) , in which the covariate vector is assumed to be fixed design. We study the asymptotical property of the proposed estimator in the random-design setting. The random design imposes challenges in deriving the rate of convergence and asymptotical normality of the partial penalized least squares estimates. Under mild regularity conditions, we prove the sparsity and establish the rate of convergence of the partial penalized least squares estimate. We further establish an asymptotical representation of the estimate. Based on the asymptotical representation, we can easily derive the asymptotical normality of the estimate and derive the asymptotical distributions of the proposed test for the direct effect under null hypothesis and under local alternative. We show that the proposed estimate of indirect effect is asymptotically more efficient than the one proposed in Zhou et al. (2020) , and indeed is asymptotically efficient under normality assumption. This is because the debias step of debiased Lasso inflates the asymptotical variance of the resulting estimate. We conduct Monte Carlo simulation studies to assess the finite sample performance of the proposed estimate in terms of bias and variance and to examine Type I error and power of the proposed test. We also conduct numerical comparisons among the proposed estimate, the oracle estimate and the estimate proposed in Zhou et al. (2020) . Our numerical comparison indicates that the proposed estimate performs as well as the oracle one, and outperforms the estimate proposed by Zhou et al. (2020) . We utilize the proposed method to study the mediator role of financial metrics that bridge company's sector and stock return. We select six financial metrics out of all the 550 that indeed mediate the pathways linking company sector and stock return, with interestingly and informatively financial interpretations. We also compare the metrics selected using our data during the COVID-19 period and those classical findings in existing works, including Fama & French (2015) , Edirisinghe & Zhang (2008), among others. We indeed discover some unique patterns and features due to the pandemic. Moreover, according to the proposed tests for effects of sector, both its direct effect and indirect effect via financial metrics are statistically significant. Therefore, evaluating the selected financial metrics, as well as the sector information, might help investors to make wiser investment decisions and choose stocks especially during the pandemic. The rest of this paper is organized as follows. In section 2, we propose a new statistical inference procedure for the indirect effect and establish its theoretical properties. We also construct an F -type test for the direct effect. Section 3 presents numerical studies and a real data example. Conclusion and discussion are given in section 4. All proofs are presented in Appendix. Consider the mediation models where y is the outcome, m is the p-dimensional mediator, x is the q-dimensional exposure variable, and a T denotes transpose of a. We in this paper assume p is high dimensional, while q is fixed and finite. Correspondingly, α 0 and α 1 are pand q-dimensional regression coefficient vectors, and Γ is a q × p coefficient matrix. Following the literature on high-dimensional mediation model (Zhang et al., 2016; van Kesteren & Oberski, 2019; Zhou et al., 2020) , we impose a sparsity assumption that only a small proportion of entries in α 0 are nonzero. This implies that the corresponding variables in m are actually relevant to y. Notably, from equation (2.2), m must be random. We further assume that ε 1 and ε are independent random errors with var(ε 1 ) = σ 2 1 and cov(ε) = Σ * ; ε 1 is independent of m, x, and ε is independent of x. Plugging (2.2) into (2.1) yields where β = Γα 0 , ε 2 = α T 0 ε with var(ε 2 ) = σ 2 2 = α T 0 Σ * α 0 , γ = β + α 1 , and ε 3 = ε 1 + ε 2 is the total random error. Following the literature (Imai et al., 2010; Vanderweele & Vansteelandt, 2014) , we refer β to the indirect effect of x on y mediated by m, α 1 to the direct effect, and γ = α 1 + β to the total effect. A causal interpretation of β and α 1 is briefly discussed in the Appendix. In practice, of interest is to test whether there exists significant (joint) indirect effect or not. This can be formulated as the following hypothesis testing problem (2.4) When both p and q are finite-dimensional, β can be estimated throughβ = Γα 0 , where Γ andα 0 are √ n-consistently estimated from models (2.1) and (2.2). That is, Γ = Γ+E γ andα 0 = α 0 +e α , where E γ = O P (1/ √ n) and e α = O P (1/ √ n) are estimation errors. Then where · stands for the Euclidean norm. When p is high-dimensional, however, the right-hand side of (2.5) is no longer O P (1/ √ n). This results in potentially non-ignorable estimation error ofβ. Moreover, β is challenging to be estimated through Γα 0 as it involves estimation of a high-dimensional matrix and a high-dimensional vector, though, interestingly, β = Γα 0 is q-dimensional, fixed and finite. As a key observation from (2.3), the indirect effect β = γ − α 1 , is the difference between the total effect and direct effect. This motivates us to estimate β by separately estimating γ via (2.3) and α 1 via (2.1), respectively, rather than estimating the high-dimensional Γ and α 0 . Suppose that {m i , x i , y i }, i = 1, · · · , n is a random sample from (2.1) and (2.2). Let y = (y 1 , · · · , y n ) T and X = (x 1 , · · · , x n ) T . Then we estimate γ by its least squares estimatê While for the estimator of α 1 , due to the high-dimensionality of α 0 , we propose the following partial penalized least squares method: where M = (m 1 , · · · , m n ) T and p λ (·) is a penalty function with a tuning parameter λ. The regularization is only applied to the high-dimensional yet sparse α 0 . We opt not penalize α 1 to achieve local power on the direct effect α 1 and the indirect effect β under local alternatives. See Theorem 2 and Corollary 1 below for more details. Thus, our proposal is different from Zhou et al. (2020) , in which the central idea is to develop a debiased estimator not of α 0 or β, but ofΣ XM α 0 This may lead to less efficient estimators due to debiasing, as discussed in the next subsection. In this section, we investigate statistical properties of the estimators. We first present some notations and assumptions. For the penalty function, it is assumed that p λ (t 0 ) is increasing and concave in t 0 ∈ [0, ∞), and has a continuous derivative p λ (t 0 ) with p λ (0+) > 0. Denote does not depend on λ. Defineρ(v, λ) = {sgn(v 1 )ρ (|v 1 |, λ), · · · , sgn(v l )ρ (|v l |, λ)} T for any vector v = (v 1 , · · · , v l ) T , where sgn(·) is the sign function. Define the local concavity of ρ(·) at v as Let θ = (α T 1 , α T 0 ) T and θ 0 = (α T 1 , α T 0 ) T , the true value of θ. Further letθ = (α T 1 ,α T 0 ) be the estimator of θ 0 . Denote A = {j : α 0j = 0}, and s = |A| is the number of elements in A. Moreover, ϑ = (α T 1 , α T 0,A ) T . And ϑ 0 ,θ are similarly defined. Let M j denote the jth column of M . Let M A be the submatrix of M formed by columns in A. m i,A is the ith column of the matrix M T A . Similarly, let α 0,A be the subvector of α 0 formed by elements in A. Define A c = [1, · · · , p] − A as the complement set of A. In this paper, for any vector λ min (A) and λ max (A) denotes the minimum and maximum eigenvalues of the matrix A, respec- We impose the following conditions: A2. Let d n be the half minimum signal of α 0,A , i.e. d n = min j∈A |α 0j |/2. Assume that A3. For some > 2, there exists a positive sequence K n such that E[ m A c ε 1 ∞ ] ≤ K n and K 2 n log p/n 1−2/ −ς → 0 for some arbitrary small ς > 0. Further assume that max 1≤j≤p+q E(z 4 j ) < C < ∞, here z = (m, x), z j is the j-th component of z. To emphasize the dependence on the sample size, in the above conditions and the Appendix, we use λ n to denote the tuning parameter. The first two conditions are mild and commonly assumed. See for instance Fan & Lv (2011) . Condition A2 imposes a minimal signal condition on nonzero elements in α 0 , but not on α 1 . Since our primary interest is to make statistical inference on direct effect α 1 and indirect effect β = γ − α 1 , and α 0 may be treated as a nuisance parameter in this model. Thus, Condition A2 is reasonable in practice. Condition A3 is imposed for establishing sparsity result. Compared with existing literature, A3 is very mild. In fact, to simplify the proof, some papers assume that all covariates are uniformly bounded -see for instance Wang et al. (2012) . Under bounded covariates condition, A3 reduces to E(|ε 1 | ) ≤ C by taking K n as a constant. Furthermore, the dimension of p is allowed to be an exponential order of the sample size n according to conditions A2 and A3. Theorem 1 Suppose that Conditions (A1)-(A3) hold, and s = o(n 1/2 ), then with probability tending to 1,α 0 must satisfy The above results provide the sparsity ofα 0 , the convergence rate ofα 0,A and the asymptotic representation ofθ, respectively. Based on the results in Theorem 1, we further obtain the following corollary: This corollary presents the asymptotic normalities of the estimatorsα 1 andβ. We next make theoretical comparison with the estimators in Zhou et al. (2020) . Note that the asymptotic variance To show our proposed estimators are more efficient than those proposed in Zhou et al. (2020) , it suffices to show thatB > B. Note that Σ −1 XX +B = (I q , 0 q×s )Σ −1 (I q , 0 q×s ) T , and Hence our proposed estimators are more efficient than those proposed in Zhou et al. (2020) . This should not be surprised because the debias Lasso inflates its asymptotical variance in the debiased step for highdimensional linear model (van de Geer et al., 2014) . The proposed partial penalized least squares method does not penalize α 1 , and hence the debiased step becomes unnecessary. Under normality assumption that ε 1 ∼ N (0, σ 2 1 ) and ε ∼ N (0, Σ * ), it can be shown that our proposed estimators are indeed asymptotically efficient. Under the normality assumption, the maximum likelihood estimator (MLE) of α 1 , α 0,A in the oracle model knowing α 0,A c = 0 satisfies By the definition ofγ andα 1 , it follows that Recall that Theorem 1 indicates that with probability tending to 1,α 0,A c = 0, and hencê Note thatα 0,A andα M 0,A have the same asymptotic distribution. Consequently,β has the same asymptotic distribution asβ M . Thus it is asymptotically efficient. To form the test statistic for the indirect effect β, we first study its asymptotic variance matrix. Let A = {j :α 0j = 0}. With probability tending to 1, we have = A. Then the variance matrix Σ and σ 2 1 can be estimated by the estimated sample version and the mean squared errors, respectively. As is shown,σ 2 1 = σ 2 1 + o P (1). In fact, when s = o(n 1/2 ), we haveσ 2 1 = σ 2 1 + O P (n −1/2 ). Alternatively, we can estimate σ 2 1 using refitted cross-validation (Fan et al., 2012) or the scaled lasso (Sun & Zhang, 2013) . As to σ 2 2 , we first estimate σ 2 = var(ε 3 ) = σ 2 1 +σ 2 2 by the classic least squares residual variance estimatorσ 2 based on model (2.3). Thusσ 2 2 =σ 2 −σ 2 1 . In practice,σ 2 1 may sometimes be larger thanσ 2 , where we would simply setσ 2 2 = 0. This is possible when no mediators are relevant. That is, α 0 = 0, and hence σ 2 2 indeed equals zero. According to Corollary 1, the asymptotic variance matrices ofα 1 andβ can be consistently estimated by: whereΣ XX = X T X/n. Then Wald test statistic for the hypotheses in (2.4) can be derived as Clearly, under H 0 , S n → χ 2 q , a chi-square random variable with q degrees of freedom. To investigate the local power of S n , we consider the local alternative hypotheses H 1n : β = ∆/ √ n, where ∆ is a constant vector. From Corollary 1, under such local alternative hypotheses, , a chi-square random variable with q degrees of freedom and noncentrality parameter ∆ T (σ 2 2 Σ −1 XX + σ 2 1 B) −1 ∆. Thus, S n can detect local effects that converge to 0 at root-n rate. It is of interest to test the following hypothesis H 02 : α 1 = 0 versus H 12 : α 1 = 0. ( 2.13 (2.14) Denote byα 0 the resulting penalized least squares estimator. Then the RSS under H 02 is RSS 0 = y−Mα 0 2 . Under H 12 , we can estimate α 0 and α 1 by the partial penalized least squares method in (2.7). Then we calculate RSS 1 = y − Mα 0 − Xα 1 2 , the RSS under H 12 . The F -type test for hypothesis (2.13) is defined to be (2.15) Theorem 2 below shows that the asymptotical null distribution of T n is a chi-square distribution with q degrees of freedom. To evaluate the local power of T n under local alternative hypotheses, we impose the following assumption. A4. Consider local alternative hypotheses H 1n : α 1 = h n . Assume that h n 2 = O( 1/n). Theorem 2 Suppose that Conditions (A1)-(A4) hold, and s = o(n 1/3 ). It follows that is a chi square random variable with q degrees of freedom and noncentrality parameter nh T n Φ −1 h n /σ 2 1 . Theorem 2 implies that under H 02 , T n asymptotically follows χ 2 q distribution, which does not depend on any parameter in the model. This is similar to the Wilks phenomenon for likelihood ratio test in classical statistical setting. In other words, the Wilks phenomenon still holds in this high dimensional mediation model. Theorem 2 also implies that T n can detect local alternatives that are distinct from the null hypothesis at the rate of 1/ √ n. To compute the partial penalized estimatorsα 1 andβ, we apply the local linear approximation algorithm (LLA) in Zou & Li (2008) with the SCAD penalty in Fan & Li (2001) , and set a = 3.7. The tuning parameter λ for our method is chosen based on the high-dimensional BIC (HBIC) method in Wang et al. (2013) . For a fixed regularization parameter λ, define The minimization of the partial penalized least squares method can be carried out as follows. 1 by minimizing a partial L 1 -penalized least squares: (α In practice, we use a data-driven method to choose the tuning parameter λ. Following Wang et al. (2013) , we use the HBIC criterion to choose λ. The HBIC score is defined as HBIC(λ) = log( y − M α 0 − Xα 1 2 2 ) + df log(log(n)) log(p + q)/n, where df is the number of variables with nonzero coefficients in (α T 0 , α T 1 ) T . Minimizing HIBIC(λ) yields a selection of λ. In this section, we examine the finite sample performance of the proposed procedures via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example. We first examine finite sample performances of the proposed partial-penalization based test statis- Example 1. In this example, we set n = 300, q = 1, and p = 500. x ∼ N (0, 1) and m = Γ T x+ε, where ε ∼ N (0, Σ * ) with Σ * being an AR correlation structure. That is, the (i, j)-element of Σ * equals ρ |i−j| and ρ is set to be 0.5. Take Γ = c 1 (τ 1 , · · · , τ p ) T , where τ k = 0.2k for k = 1, · · · , 5, and when k > 5, τ k 's are independently generated from N (0, 0.1 2 ). Set c 1 = 0 to examine Type I error rate and c 1 = ±0.1, ±0.2, · · · , ±1 for power when testing the indirect effects. We generate the response y from model y = α T 0 m + α T 1 x + ε 1 , where ε 1 ∼ N (0, 0.5 2 ), α 0 = [1, 0.8, 0.6, 0.4, 0.2, 0, · · · 0] T and α 1 = c 2 is set in the same fashion as c 1 . The simulation results are based on 500 replications. The significance level is set to be 0.05. We first compare the performances of S n , S O n and S Z n for testing the indirect effect β. We set c 2 = 0.5 and β = Γα 0 = 1.4c 1 . The left panel of Figure 1 S n performs as well as the oracle S O n , and is generally more powerful than S Z n . For instance, when c 1 = −0.2, the empirical power of S Z n is 0.516, while the empirical powers of S n and S O n are 0.596. These observations are in consistent with the theoretical results in Section 2. almost the same as the oracle one, and is obviously more powerful than the test T Z n proposed in Zhou et al. (2020) , whose power curve is asymmetric. In fact, when c 2 = −0.2, the empirical powers of our test statistic T n and the oracle test T O n are about 1, while that of T Z n is only about 0.780. Furthermore, T Z n performs unstably according to our simulation studies. To gain insight of this, we explore more onα Z 1 ,β Z . The estimatesα 1 ,β andα O 1 ,β O are reported in Table 1 from which it can be seen that the biases ofα 1 ,β andα O 1 ,β O are very small, whileα Z 1 has a large bias. This may be due to that the direct effect α 1 is also penalized in Zhou et al. (2020) 's estimation procedure based on scaled lasso. This makes sense only if the direct effect is expected to be zero. As seen in Table 1 , the bias ofα Z 1 is very small when c 2 = 0, yet inversely when c 2 = 0. Table 1 also reports standard errors of corresponding estimates. Both the proposed method and oracle outperform Zhou et al. (2020) , especially when estimating α 1 . To assess the accuracy of variance estimation ofα 1 andβ, Table 2 reports their estimated standard errors in two ways. As to each method -new, oracle and Zhou et al.'s method, the first settings -when holding c 2 = 0.5, vary c 1 = 0, ±0.1, · · · , ±1 in (a) and holding c 1 = 0.5, vary Lastly, Table 3 reports the computing times, where the new method is nearly 1000 times faster than Zhou et al.'s method. The proposed method is very fast and stable because initialized by LASSO estimator, LLA algorithm converges in one step. Example 2. In this example, we examine the finite sample performances of proposed method when heavy-tail errors are encountered. Specifically, assume now ε 1 ∼ t 6 / √ 6. The multiplier √ 6 ensures the equality of variance of ε 1 to that when ε 1 ∼ N (0, 0.5 2 ). All other settings are identical to those in Example 1. We first investigate the performances of S n , S O n and S Z n for testing indirect effect β via the left panel of Figure 3 . The proposed test S n performs as well as the oracle one S O n in terms of controlling Type-I error rate (c 1 = 0) and possessing much larger power than S Z n (when c 1 = 0), especially when c 1 < 0. Similar phenomenons are observed in the right penal of Figure 3 when examining T n , T O n and T Z n . The proposed test T n performs as well as the oracle one, and is more powerful than the test T Z n . In fact, when c 2 = −0.2, the empirical powers of our test statistic T n and the oracle test T O n are about 1, while that of T Z n is only about 0.756. In addition, we also evaluate the accuracy and precision ofα 1 andβ through Tables 4 and 5. The overall pattern in these two tables with ε 1 ∼ t 6 / √ 6 is very similar to that for ε 1 ∼ N (0, 0.5 2 ). In sum, the proposed method retains its validity for heavy-tailed error distributions. We apply the proposed method to an empirical analysis to examine whether financial statements items and metrics mediate the relationship between company sectors and stock price recovery after COVID-19 pandemic outbreak. While investors and researchers have reached a consensus ages ago that stock returns highly rely on companies' belonging sectors, recent studies more focus on using financial statements or market conditions to predict stock returns. Fama & French (1993) 's pioneering proposal of the three-factor model started this era, which captures patterns of return using market return, firm size and book-to-market ratio factors. Callen & Segal (2004) showed that accruals, cash flow, growth in operating income significantly influence stocks return. Edirisinghe & Zhang (2007 & Zhang ( , 2008 ) developed a relative financial strength metric based on data envelopment analysis (Farrell, 1957; Charnes et al., 1978) , and found that return on assets and solvency ratio has high correlation with stock price return. To enhance prediction accuracy, deep neural network and data mining techniques were developed, with model inputs as historical financial statements and output as stock price return (Enke & Thawornwong, 2005; Huang et al., 2019; Lee et al., 2019) . Meanwhile, it is reasonable to hypothesize that companies' sectors affect stock performances via influencing the associated financial metrics. Few existing works, however, study the mediating effects of such financial metrics. Hence our analysis aims to fill in this gap, and use the proposed mediation analysis to select important financial metrics, as well as to test the direct and indirect effects of companies' sectors on returns. In addition, we in this analysis are specifically interested in the stock performance of S&P 500 component companies during the COVID-19 pandemic period. As is known, the outbreak of the COVID-19 dealt a shock to the U.S. economy with unprecedented speed, and the government had to take a lockdown to stop spread of virus. The lockdown took a toll in the U.S. economy: business were closed, millions of people lost jobs and the price of an oil futures contract fell below zero. The crisis spread to the U.S. stock market, dragging down the major index S&P 500 by 33.92%. To help businesses, households and the economy, the Federal Reserve and the White House launched various rescue programs and take measures to stabilize energy prices from the end of March, 2020. Therefore, all these events and measures led the U.S. stock market to a V-shape pattern, thanks to which, the general financial rules from classical literature may not directly apply any more. Admittedly, a number of recent literature studied the economic reaction to COVID-19 pandemic from sector or company level data (Ramelli & Wagner, 2020; Zhang et al., 2020; Baker - opinions about future prospects. They discovered several important factors related to accounting and business fundamentals, including supply chain, production and operations and financing, that are highly associated with stock market recovery from COVID-19. However, these methods mainly rely on prior financial knowledge to select low dimensional data for modeling, while ignore important company level factors. Besides, these methods only consider the relation of stock return to either sector level or company level while failing to recognize that the company's financial plays a role in mediating stock sector effects to stock price return. Therefore, we use the proposed method to study the financial statement items or metrics that mediate the relationship between firm sectors and stock performance in this special period. This work may then shed light on how to select valuable stocks during a pandemic or any adverse event likewise. In the mediation models, the response is taken to be the stock return from its highest price before the pandemic in February, 2020 to April 30th, 2020. The closed price is adjusted for both dividends and splits. The potential mediators in m are 550 accounting metrics from financial We use the firms' latest annual report to compute financial metrics and use previous annual reports to compute average growth rate of each financial metrics. The exposure variables in x, are companies' sectors according to Global Industry Classification Standard (GICS) that are coded as dummy variables. GICS classifies companies into eleven sectors: basic materials, communication services, consumer cyclical, consumer defensive, energy, financial services, healthcare, industrials, real estate, technology and utilities. We set energy sector as baseline level. Table 6 presents the estimated direct and indirect effects of companies' sectors, together with their standard errors. We also calculate Wald's test for the indirect effect and generalized likelihood test for direct effect, with p-values smaller than 10 −9 and 10 −15 , respectively, indicating both the direct and indirect effect are significant. As for direct effect, stocks in sectors such as healthcare and technology are more likely to outperform benchmark than ones from utilities sector. Furthermore, sectors influence the stocks performance partly through business operation reflected by selected financial metrics, and the indirect effects are significantly positive. The selected mediating metrics, their associated estimated coefficients in model (2.1), as well as their brief descriptions, are presented in Table 7 . These selected metrics are of their own significance. For instance, the first three chosen metrics in Table 7 , namely return on assets, gross margin and annual growth rate of operating income, reflect firms' revenue. Return on assets is an indicator of how well a firm utilizes its assets, by determining how profitable a firm is relative to its total assets. A firm with a higher return-on-assets value is preferred, as the firm squeezes more out of limited resources to make a profit. Gross margin is the portion of sales revenue a firm retains after subtracting costs of producing the goods it sells and the services it provides. It measures the gross profit of a firm. A firm that has higher gross margin is more likely to retain more profit for every dollar of good sold. Annual growth rate of operating income shows the firm's growth of generating operating income compared with previous year. Operating income measures the amount of profit realized from a business's operation, after deducting operating expenses such as wages, depreciation, and cost of goods sold. A firm with high growth of operating income can avoids unnecessary production costs, and improve core business efficiency. In a word, a firm with higher return on assets, gross margin and growing operating income is considered profitable, and hence, is likely to attract investors. On the other hand, both the average growth rate of quick ratio and debt to assets are indicators of financial leverage of a firm. Quick ratio of a firm is defined as the dollar amount of liquid assets dividing that of current liabilities, where liquid assets are the portion of assets that can be quickly converted into cash with minimal impact on the price received in open market, while current liabilities are a firm's debts or obligations to be paid to creditors within one year. Thus a large quick ratio indicates that the firm is fully equipped with enough assets to be instantly liquidated to pay off its current liabilities. Debt to assets is the total amount of debt relative to assets owned by a firm. It reflects a firm's financial stability. Therefore, a firm with a higher quick ratio or a lower debt to assets might be more likely to survive when it is difficult to finance through borrowing and cover its debts, thus are more favorable to investors during the economy lockdown. Lastly, receivables turnover quantifies a firm's effectiveness in collecting its receivables or money owed by clients. It shows how well a firm uses and manages the credit it extends to customers and how quickly that short-term debt is paid. Receivables turnover can be negative when net credit sale is negative because the client pre-pay for the product or service. A negative receivables turnover means that the firm are less susceptible to counter-party credit risk because it already receives the cash from its client before delivering the service or shipping out the product. This is especially important during liquidity dry periods when the clients may default or delay payment due to lack of cash. Therefore, a firm that has a negative receivables turnover is preferred. On all accounts, one might incorporate the analysis results as reference when seeking for a Table 7 to filter stocks. For example, we shall select firms that have higher values in AGR operating income, gross margin, quick ratio, and return on assets but lower values in debt to assets and receivable turnover. Moreover, we compare our findings with those selected in established models. For instance, our method picks profitability factors like return on assets, which is also selected in Fama & French (2015) , as profitability is the core of a firm's stock performance. But we do not include metrics representing size of firm, valuation of stock price or investment that were covered by Fama & French (2015) . For firm size factor, there is no evidence that small-size firms recovered faster or slower than larger-size ones. For valuation of stock price factor, previous price valuation ratio changed significantly due to stock price change and is no longer reliable to predict future stock return. For investment factor, it is less important for a short-term stock price movement. Compared with Edirisinghe & Zhang (2008) , our method also picks profitability (return on assets), liquidity (quick ratio) and solvency (debt to assets) metrics, as in Edirisinghe & Zhang (2008) . During the crisis, a firm facing liquidity crunch could not access to credit. Therefore, a firm with sufficient cash and less debt is more easily to survive and less likely to be forced to liquidate valuable assets at unfavorable prices. And its stock would be safer and more attractive to investors. But we did not select metrics of earnings per share or about capital intensity as in Edirisinghe & Zhang (2008) . The lockdown dramatically changes a firm's revenue structure and capital allocation, and hence reduces predictive capability of these metrics to short-term recovery. In this paper, we propose statistical inference procedures for the indirect effects in high dimensional mediation model. We introduce a partial penalized least squares method and study its statistical properties under random design. We show that the proposed estimators are more efficient than existing ones. We further propose a partial penalized Wald test to detect the indirect effect, with a χ 2 limiting null distribution. In this paper, we also propose an F -type test for the direct effect and reveal Wilks phenomenon in the high-dimensional mediation model. We further utilize the proposed inference procedures to analyze the mediation effects of various financial metrics on the relationship between company's sector and the stock return. ᾱ 0,A c = 0, such that θ − θ 0 2 = O P ( s/n). In the second step, we prove thatθ is indeed a local minimizer of Q n (θ). This impliesθ =θ. In the final step, we derive the asymptotic expansion of θ. Step 1: Consistency in the (s + q)-dimensional subspace: We first constrain Q n (θ) on the (s + q)-dimensional subspace of {θ ∈ R p+q : α 0,A c = 0}. This constrained partial penalized least squares function is given bȳ Here ϑ = (α T 1 , δ T ) T and δ = (δ 1 , · · · , δ s ) T . We now show that there exists a strict local minimizer ϑ ofQ n (ϑ) such that θ − ϑ 0 2 = O P ( s/n). To this end, we consider an event where N τ = {ϑ ∈ R s+q : ϑ − ϑ 0 2 ≤ τ s/n} with τ ∈ (0, ∞), and ∂N τ denotes the boundary of the closed set N τ . Clearly, on the event H n , there exists a local minimizer ofQ n (ϑ) in N τ . Thus, we only need to show that P (H n ) → 1 as n → ∞ when τ is large. To this aim, we next analyze the functionQ n on the boundary ∂N τ . For any ϑ, it follows from a second order Taylor's expansion that where α * 0,A lies in the line segment jointing δ and α 0,A , and Λ(α * 0,A ) is a diagonal matrix with nonnegative diagonal elements. Clearly α * 0,A ∈ N 0 . By condition (A2), the maximum eigenvalue of Λ(α * 0,A ) is upper bounded by λ n κ 0 . Recall that Further note that Since λ min (Σ) ≥ c and λ n κ 0 = o(1), we have: Consequently, we obtain By the Markov inequality, it entails that In the following, we aim to show that E ν 2 2 = O(s/n). Note that Then by condition (A1), It follows from the concavity of ρ(·), d n < |α 0j,A |, and condition (A2) that: Consequently, step 1 is completed. Step 2: Sparsity: According to Theorem 1 in Fan and Lv (2011) , it suffices to show that with probability tending to 1, we have: Hereθ = (ᾱ T 1 ,ᾱ T 0 ) T satisfies thatᾱ 0,A c = 0 and θ − θ 0 2 = O P ( s/n). Note that For the second term, Next we come to determine the rate of the first term M T A c 1 ∞ . Let a n = n 1/ +ς K n , b = √ Cn log p with C being large enough and note that We have ij,2 | > b/2, for some j ∈ A c =: P 1 + P 2 . Firstly consider the term P 1 . Note that 1j,1 , . . . , nj,1 are independent centered random variables a.s. bounded by 2a n in absolute value. Then the Bernstein inequality yields that Next we turn to consider P 2 . First note that Further note that E 2 [|m j ε 1 |I(|m j ε 1 | > a n )] ≤ E[m 2 j ε 2 1 ]P (|m j ε 1 | > a n ) ≤ E[m 2 j ε 2 1 ] E[|m j ε 1 | ] a n . We then conclude that From this, we then have (1). Consequently, given condition A2, step 2 is finished. Step 3: Asymptotic expansions: Steps 1 and 2 show thatα 0,A c = 0 with probability tending to 1, and further α 0,A − α 0,A 2 = O P ( s/n). First denoteL Forθ, denoteL Under condition (A2), we have α 0,A − α 0,A ∞ = O P ( s/n) d n . This implies that min j∈A |α 0j,A | > min j∈A |α 0j,A | − d n = d n . By the concavity of p(·) and condition (A2), we obtain that nλ nρ (α 0,A ) 2 ≤ ns 1/2 p λn (d n ) = o(n 1/2 ). Since λ max (Σ −1 ) = O(1), it follows that √ n(θ − ϑ 0 ) = Σ −1 1 √ nL (ϑ 0 ) + o P (1). (A.8) Further we obtain that var(W 1i ) = σ 2 2 Σ XX , var(W 2i ) = σ 2 1 Σ M M.X , and As a result, it follows that Proof of Theorem 2: Similar to the arguments in the proof of Theorem 1, we can also show that α 0,A c = 0 with probability 1, and further α 0,A − α 0,A 2 = O P ( s/n). Denote ∆θ =θ −θ = (∆θ 1 , ∆θ 2 ) and It is noted that From the proof of Theorem 1, it is known that nλ nρ (α 0,A ) 2 = o P (n 1/2 ) and similarly nλ nρ (α 0,A ) 2 = o P (n 1/2 ). Further recall that D 1 − Σ 2 = O P (s/ √ n) and ∆θ 2 = (θ − Note that √ n∆θ 1 = √ n(α 1 − α 1 ) + √ nh n = O P (1) from Corollary 1. Thus we get √ n∆θ 2 = O P (1), which further implies that ∆θ T 2 nλ nρ (α 0,A ) = o P (1). Now we are ready to investigate the asymptotic distribution of T n . Under the eventα 0,A c = α 0,A c = 0 and recalling equation (A.15), we can show that RSS 1 − RSS 0 = −2∆θ TL (θ) + ∆θ T nD 1 ∆θ = −n∆θ T 1 (Σ 11 ) −1 ∆θ 1 + o P (1). (A.16) Now denote Φ = (I q , 0 q×s )Σ −1 (I q , 0 q×s ) T . It is easy to know that Φ = Σ 11 . From the proof of Corollary 1, it is known that √ n(α 1 − α 1 ) → N (0, σ 2 1 Φ). Thus we obtain that RSS 0 − RSS 1 = Φ −1/2 [ √ n(α 1 − α 1 )] + √ nΦ −1/2 h n 2 2 + o P (1). On the other hand, we have It is easy to know that I 1 → σ 2 1 , while I 3 ≤ θ − ϑ 0 2 2 D 1 2 = O P (s/n) = o P (1). Further note that I 2 ≤ θ − ϑ 0 2 L (ϑ 0 ) 2 /(n − q) = O P ((1/n) s/n √ ns) = O P ( s n ) = o P (1). In sum, it follows that RSS 1 n − q = σ 2 1 + o P (1). (A.18) As a result, we have T n = RSS 0 − RSS 1 RSS 1 /(n − q) → χ 2 q (nh T n Φ −1 h n /σ 2 1 ). Under the independence conditions of random errors in the models, the sequential ignorability assumption (Imai et al., 2010) holds, and the natural direct and indirect effects can be identified. As argued by Imai et al. (2010) , only the sequential ignorability assumption is needed and neither the linearity nor the no-interaction assumption is required for the identification of mediation effects. However, in the situation with high dimensional mediators, it would be very challenging if not impossible to make inference about the mediation models without linearity nor the no-interaction assumption. The linearity and the no-interaction assumptions are widely adopted in recent studies about HDMM (Zhang et al., 2016; van Kesteren & Oberski, 2019; Zhou et al., 2020) . To define the natural direct and natural indirect effects, we give some notation first. Let y(x * , m * ) denote the potential outcome that would have been observed had x and m been set to x * and m * , respectively, and m(x * ) denotes the potential mediator that would have been observed had x been set to x * . Following Imai et al. (2010) ; Vanderweele & Vansteelandt (2014) , and others, for x = x 1 versus x 0 , the natural direct effect is defined as E[y(x 1 , m(x 0 )) − y(x 0 , m(x 0 ))]. While the indirect effect is defined as E[y(x 1 , m(x 1 )) − y(x 1 , m(x 0 ))]. Then the total effect E[y(x 1 , m(x 1 )) − y(x 0 , m(x 0 ))] is the sum of the natural direct and indirect effect. Vanderweele & Vansteelandt (2014) showed that E[y(x 1 , m(x 0 )) − y(x 0 , m(x 0 ))] = α T 1 (x 1 − x 0 ) E[y(x 1 , m(x 1 )) − y(x 1 , m(x 0 ))] = (Γα 0 ) T (x 1 − x 0 ). Thus α 1 can be interpreted as the average natural direct effect, and β = Γα 0 can be interpreted as the average natural indirect effect, of a one-unit change in the exposure x. Fundamental analysis, future earnings, and stock prices Estimation and inference for the counterfactual distribution and quantile functions in continuous treatment models Approximate residual balancing: Debiased inference of average treatment effects in high dimensions The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations The unprecedented stock market reaction to COVID-19 Inference on treatment effects after selection among high-dimensional controls Do accruals drive firm-level stock returns? A variance decomposition analysis Inference for individual mediation effects and interventional effects in sparse high-dimensional causal graphical models Measuring the efficiency of decision making units High-dimensional multivariate mediation with application to neuroimaging data Causal impact of masks, policies, behavior on early covid-19 pandemic in the US The effects of two influential early childhood interventions on health and healthy behaviour Estimating the COVID-19 cash crunch: Global evidence and policy High dimensional mediation analysis with latent variables The value relevance of financial statements and their impact on stock prices Estimation and inference for distribution functions and quantile functions in treatment effect models Generalized DEA model of fundamental analysis and its application to portfolio optimization Portfolio selection under DEA-based relative financial strength indicators: case of US industries The use of data mining and neural networks for forecasting stock market returns Common risk factors in the returns on stocks and bonds A five-factor asset pricing model Variance estimation using refitted cross-validation in ultrahigh dimensional regression Variable selection via nonconcave penalized likelihood and its oracle properties Statistical Foundations of Data Science Nonconcave penalized likelihood with NP-dimensionality IPAD: stable interpretable forecasting with knockoffs inference RANK: large-scale inference with graphical nonlinear knockoffs The measurement of efficiency productive Coronavirus: Impact on stock prices and growth expectations The value-relevance of financial and non-financial information for Internet companies Firm-level Exposure to Epidemic Diseases: Covid-19, SARS, and H1N1 Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-based Approach Neural network models for stock selection based on fundamental analysis Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators Direct and indirect effects of continuous treatments based on generalized propensity score weighting A general approach to causal mediation analysis Nonparametric estimation of average treatment effects under exogeneity: A review Confidence intervals and hypothesis testing for high-dimensional regression Global stock market investment strategies based on financial network indicators using machine learning techniques Advances in mediation analysis: A survey and synthesis of new developments Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models Feverish stock price reactions to COVID-19. The Review of Corporate Finance Studies Linear hypothesis testing for high-dimensional generalized linear model Bayesian shrinkage estimation of high dimensional causal mediation effects in omics studies Scaled sparse linear regression A review of causal estimation of effects in mediation analyses The Impact of the COVID-19 Pandemic on the US Economy: Evidence from the Stock Market On asymptotically optimal confidence regions and tests for high-dimensional models Exploratory mediation analysis with many potential mediators Explanation in Causal Inference: Methods for Mediation and Interaction Mediation analysis with multiple mediators Calibrating non-convex penalized regression in ultra-high dimension Quantile regression for analyzing heterogeneity in ultra-high dimension Financial markets under the global pandemic of COVID-19 Estimating and testing high-dimensional mediation effects in epigenetic studies Confidence intervals for low dimensional parameters in high dimensional linear models Sparse principal component based highdimensional mediation analysis Estimation and inference for the indirect effect in high-dimensional linear mediation models One-step sparse estimates in nonconcave penalized likelihood models Guo and Liu's research was supported by grants from National Natural Science Foundation of China grants 12071038, 11701034, and 11771361, and Li and Zeng's research was supported by National Science Foundation, DMS 1820702, 1953196 and 2015539. Proof of Theorem 1: To enhance the readability, we divide the proof of Theorem 1 into three steps. In the first step, we show that there exists a local minimizerθ of Q n (θ) with the constraints Proof of Corollary 1: Recall thatThe asymptotic variance matrix ofα 1 isConsequently we obtain that