key: cord-0502464-4e127aki authors: Wang, Xufei; Jiang, Bo; Liu, Jun S. title: Varying Coefficient Model via Adaptive Spline Fitting date: 2022-01-25 journal: nan DOI: nan sha: be064b4bae5b72abc75f4880e57d19750ea2e6ce doc_id: 502464 cord_uid: 4e127aki The varying coefficient model has received wide attention from researchers as it is a powerful dimension reduction tool for non-parametric modeling. Most existing varying coefficient models fitted with polynomial spline assume equidistant knots and take the number of knots as the hyperparameter. However, imposing equidistant knots appears to be too rigid, and determining the optimal number of knots systematically is also a challenge. In this article, we deal with this challenge by utilizing polynomial splines with adaptively selected and predictor-specific knots to fit the coefficients in varying coefficient models. An efficient dynamic programming algorithm is proposed to find the optimal solution. Numerical results show that the new method can achieve significantly smaller mean squared errors for coefficients compared with the commonly used kernel-based method. The problem of accurately estimating the relationship between a response variable and multiple predictor variables is fundamental to statistics and machine learning, and many other scientific applications. Among parametric models, the linear regression model is a simple and powerful approach, yet it is somewhat limited by its linearity assumption, which is often violated in real applications. In contrast, non-parametric models do not assume any specific type of relationship between the response variable and the predictors, which offer some flexibility in modeling nonlinear relationships, and are powerful alternatives to linear models. However, non-parametric models often need to impose certain local smoothness assumptions which are mostly achieved by employing a certain class of kernels or spline basis functions, to overcome the overfitting issue. This is known to be plagued by the curse of dimensionality in that these methods are both ineffective in capturing the true relationship and computationally very expensive for high dimensional data sets. Serving as a bridge between linear and non-parametric models, the varying coefficient model (Hastie & Tibshirani, 1993) provides an attractive compromise between simplicity and flexibility. In this class of models, the regression coefficients are not set constants and are allowed to depend on other conditioners. As a result, the varying coefficient model is more flexible because of the infinite dimensionality of the corresponding parameter spaces. Compared to standard non-parametric approaches, this method rises as a powerful strategy to cope with the curse of dimensionality. This method also inherits all advantages from linear models of being simple and interpretable. A typical setup for the varying coefficient model is as follows. Given response y and predictors X = (x 1 , . . . , x p ) T , the model assumes that where u is the conditional random variable, which is usually a scalar. The varying coefficient model has a broad range of applications, including longitudinal data Tang & Cheng, 2012) , functional data (Zhang & Wang, 2014; Hu et al., 2019) , and spatial data (Wang & Sun, 2019; Finley & Banerjee, 2020) . Moreover, varying coefficient models could naturally be extended to time series contexts (Huang & Shen, 2004; Lin et al., 2019) . There are three major approaches to estimate the coefficients β j (u) (j = 1, . . . , p). One widely acknowledged approach is the smoothing spline method proposed by Hastie & Tibshirani (1993) , with a recent follow-up work using P-spline (Astrid et al., 2009) . Fan & Zhang (1999) and Fan & Zhang (2000) proposed a two-step kernel approach for estimating the coefficients. The first step is to use local cubic smoothing to model function β j (u), and the second step re-applies local cubic smoothing on the residuals. A recent follow-up of their work is an adaptive estimator by Chen et al. (2015) . Approximation of the coefficient functions using a basis expansion, e.g. polynomial B-splines, is always popular as it leads to a simple solution to estimation and inference with good theoretical properties. Compared with smoothing spline and kernel methods, the polynomial spline method with a finite number of knots strikes a balance between model flexibility and interpretability of the estimated coefficients. Huang et al. (2002) , Shen (2004), and utilized a set of polynomial estimators. The assumptions made in these works are that the knots are equidistant and the number of knots is chosen such that the bias terms become asymptotically negligible to guarantee the local asymptotic normality. Most polynomial spline approaches run optimization based on a set of finitedimensional classes of functions, such as a space of polynomial B-splines with L equally spaced knots. If the real turning points of the coefficients are not equidistant yet we still use the method assuming equally spaced knots, then L should be large enough to make sure the distances between knots in the model are small enough to capture the resolution for the coefficients. In theory, it is not clear how to systematically determine L yet. In practice, L is always chosen by a parameter searching process accompanied with other parameters, as people need to compare a set of parameters before determining the optimal fixed number of knots. Too few knots might ignore the high-frequency information of β j (u), while too many knots might over-fit the area where the coefficients barely change. Variable selection for varying coefficient models, especially when the number of predictors is larger than the sample size, is also an interesting direction to study. In this paper, we propose two adaptive algorithms that are enabled by first fitting piece-wise linear functions with automatically selected turning points for univariate conditioner variable u. Compared with traditional methods above, these algorithms can automatically determine optimal positions of knots modeled as the turning points of the true coefficients. We prove that, if the coefficients are piece-wise linear in u, the selected knots by our methods are almost surely the true change points. We further combine the knots selection algorithms with the adaptive group lasso method for variable selection in high-dimensional problems, following a similar idea as Wei et al. (2011) who applies the adaptive group lasso to basis expansions of predictors. Our simulation studies demonstrate that the new adaptive method achieves smaller mean squared errors for estimating the coefficients compared to available methods, and improves the variable selection performance. We finally apply the method to study the association between environmental factors and COVID-19 infected cases and observe time-varying effects of the environmental factors. 2 Adaptive spline fitting methods In varying coefficient models, each coefficient β j (u) is a function of the conditional variable u, which can be estimated by fitting a polynomial spline on u. In this paper we assume that u is a univariate variable. Let X i = (x i,1 , . . . , x i,p ) T ∈ R p , u i , and y i denote the ith observations of the predictor vector, the conditional variable, and the response variable, respectively, for i = 1, . . . , n. Suppose the knots are common to all coefficients located at d 1 < . . . < d L , and the corresponding B-splines of degree D are B k (u) (k = 1, . . . , D + L + 1). Each varying coefficient can be represented as β j (u) = D+L+1 k=1 c j,k B k (u), where the coefficients c j,k are estimated by minimizing the following sum of squared errors: (1) In previous work, the knots for polynomial spline were always chosen as equidistant quantiles of u and are the same for all predictors. The approach is computationally straightforward, but the knots chosen cannot properly reflect the varying smoothness between and within the coefficients. We propose an adaptive knot selection approach in which the knots can be interpreted as turning points of the coefficients. For knots d 1 < . . . < d L , we define the segmentation scheme S = {s 1 , . . . , s L+1 } for the observed samples ordered by u, where s k = {i | d k < u i ≤ d k+1 }, with d 0 = −∞ and d L+1 = ∞. If the true coefficients β(u) = (β 1 (u), . . . , β p (u)) T ∈ R p is a linear function of u within each segment, i.e., β(u) = a s + b s u for a s , b s ∈ R p , then the observed response satisfies Thus, the coefficients can be estimated by maximizing the log-likelihood function, which is equivalent to minimizing the loss function: whereσ 2 s is the residual variance by regressing y i over (X i , u i X i ) for i ∈ s. Because any almost-everywhere continuous function can be approximated by piece-wise linear functions, we can employ the estimating framework in (2) and derive (3). To avoid over-fitting when maximizing the log-likelihood over the parameters and segmentation schemes, we constrain that |s| is greater than a lower bound m s = n −α (α > 0), and penalize the loss function with the number of segments where λ 0 > 0 is the penalty strength. The resulting knots correspond to the segmentation scheme S that minimizes the penalized loss function (4). When λ 0 is very large, this strategy tends to select no knots, whereas when λ 0 gets close to 0 it can select as many knots as n 1−α . We find the optimal λ 0 by minimizing the Bayesian information criterion (Schwarz et al., 1978 ) of the fitted model. For a given λ 0 , suppose L(λ 0 ) knots are finally proposed, and the fitted model isf (X, u). Then, we have The optimal λ 0 is selected by searching over a grid to minimize BIC(λ 0 ). We call this procedure the global adaptive knots selection strategy as it assumes that all the predictors have the same set of knots. We will discuss in Section 2.4 how to allow each predictor to have its own set of knots. Here we only use the piece-wise linear model (2) and loss function (4) for knots selection, but will fit the varying coefficients with B-splines derived from the resulting knots via minimizing (1). In this way, the fitted varying coefficients are smooth functions and the smoothness is determined by the degree of the splines. This method is referred to as the one-step adaptive spline fitting throughout the paper. The proposed method is invariant to marginal distribution of u. Without loss of generality, we assume that u follows the uniform distribution in [0, 1]. According to Definition 1 below, a turning point of u, denoted as c k , is defined as a local maximum or minimum of any coefficient β i (u) (j = 1, . . . , p). In Theorem 1, we show that the adaptive knots selection approach can almost surely detect all the turning points of β(u). where X is bounded and β(u) is a bounded continuous function. Moreover, assume X, u, are independent. Let 0 < c 1 < . . . < c K < 1 be the turning points in Definition 1 and d 1 < . . . < d L be the selected knots with m s = n −α (α > 0), then for 0 < γ < 1 − α and λ 0 large enough, Details of the proof are in Section 3.1 of the supplementary material. Although the adaptive spline fitting method is motivated by a piece-wise linear model, Theorem 1 shows that with probability approaching 1 we can detect all the turning points accurately under general varying coefficient functions. The selected knots could be a superset of the real turning points especially when λ 0 is small, so we tune λ 0 with BIC (5) which penalizes the number of selected knots, to find the optimal set. When the varying coefficient functions are piece-wise linear, each coefficient β j (u) (j = 1, . . . , p) can be almost surely defined as is a simple linear function of u, otherwise we call c j,k (k = 1, . . . , K j ) the change points for coefficients β j (u), because the linear relationship varies before and after these points. Then Kj ≥1 {c j,1 , . . . , c j,Kj } are all the change points of the entire varying coefficient function. In Theorem 2, we show that the adaptive knots selection method can almost surely discover the change points of β(u) without false positive selection. Theorem 2 Suppose (X, y, u) follow the same assumption as in Theorem 1, and β(u) is a piece-wise linear function of u. Let 0 < c 1 < . . . < c K < 1 be the change points defined as above and d 1 < . . . < d L be the selected knots with m s = n −α (α > 0), then for 0 < γ < 1 − α and λ 0 large enough, Details of the proof are in Section 3.2 of the supplementary material. The theorem shows that if the varying coefficient function is piece-wise linear, the method can discover all the change points with almost 100% accuracy. The brute force algorithm to compute the optimal knots has a computation complexity of O(2 n ) and is impractical. As summarized by the following algorithm, we propose a dynamic programming approach whose computation complexity is of order O(n 2 ). If we further assume that the knots can only be chosen from a predetermined set M, e.g. m √ n (m = 1, . . . , √ n − 1) quantiles of u i 's, the computation complexity can be further reduced to O(|M| 2 ). Note that the algorithm in Section 2.4 of Wang et al. (2017) is a special case with a constant x. The algorithm is summarized in the following three steps: Step 1 Data preparation: Arrange the data (X i , u i , y i ) (i = 1, . . . , n) according to order of u i from small to large. Without lose of generality, we assume that u 1 < · · · < u n . Step 2 Obtain the minimum loss by forward recursion: Define m s = n −α (α > 0) as the smallest segment size, and set λ = λ 0 log(n). Initialize two sequence: Here is the residual variance of regressing y k on (X k , u k X k ) (k = i , . . . , i). Step 3 Determine knots by backward tracing: Let p = Prev n . If p = 1 no knot is needed; otherwise, we recursively add 0.5(u p−1 + u p ) as a new knot with p = Prev p until p = 1. When the algorithm is run with a grid of λ 0 , we repeat Step 2 and 3 for all the λ 0 's, and return the final model with the minimum BIC. The global adaptive knots selection method introduced in Section 2.1 assumes that the set of knot locations are the same for all predictors, similar to most of the literature for polynomial spline fitting. However, different coefficients may have a different resolution of smoothness relative to u, and a different set of knots for each predictor is preferable. Here we propose a two-step adaptive spline fitting algorithm on top of the global knot selection. Suppose the fitted model for the one-step adaptive spline fitting isf (X, u) = p j=1β j (u)x j . For predictor j, the knots can be updated by performing the same knots selection procedure between it and the residual without it, that is If BIC in (5) is smaller for the corresponding new polynomial spline model with the new knots, the knots and fitted model are update. The steps are repeated until BIC cannot be further minimized. The following three steps summarizes the algorithm: Step 1 One-step adaptive spline: Run the one-step adaptive spline fitting algorithm for (X i , u i , y i ) (i = 1, . . . , n). Denotef (X, u) as the fitted model and compute model BIC with (5). Step 2 Update knots with BIC: For j = 1, . . . , p, compute residual variable r i by (6). Run the one-step adaptive spline fitting algorithm for the residual and predictor j, that is (x i,j , u i , r i ) (i = 1, . . . , n). Denote the new fitted model asf j (X, u) and corresponding BIC as BIC j . Note that when we assume different number of knots for each predictor, the term p{L(λ 0 ) + D + 1} log(n) in (5) should be replaced by p j=1 L j (λ 0 ) + p(D + 1) log(n), where L k (λ 0 ) is the number of knots for predictor j. Step 3 Recursively update model: If min BIC j < BIC and j * = arg min BIC j , update the current model byf (X, u) =f j * (X, u), and repeat Step 2. Otherwise return the fitted modelf (X, u). An alternative approach to model the heterogeneous relationships between coefficients is to replace the initial model in Step 1 withf (X, u) = 0, and repeat the following two steps. However, starting from the one-step model is preferred because fitting to residual instead of the original response minimizes the mean squared error more efficiently. Simulation studies in Section 3 show that the predictor-specific knots can further reduce mean squared error for the fitted coefficient, compared with the global knot selection approach. When the number of predictors is large and the number of predictors with non-zero varying coefficients is small, we perform a variable selection for all the predictors. For predictor j, we run the knots selection procedure between it and the response, and construct B-spline functions as {B j,k (u)} (L = 1, . . . , L j + D + 1) where L j is the number of knots and D is the degree of the B-splines. Then, we perform the variable selection method for varying coefficient model proposed by Wei et al. (2011) , which is a generalization of group lasso (Yuan & Lin, 2006) and adaptive lasso (Zou, 2006) . Wei et al. (2011) shows that group lasso tends to over select variables and suggests adaptive group lasso for variable selection. In their original algorithm, the knots for each predictor are chosen as equidistant quantiles and not predictor-specific. The following three steps summarize our proposed algorithm: Step 1 Select knots for each predictor: Run the one-step adaptive spline fitting algorithm between each predictor x i,j and response y i . Denote the corresponding B-splines as B j,k (u) (k = 1, . . . , L j + D + 1) where L j is the number of knots for predictor j and D is degree of splines. Step 2 Group lasso for first step variable selection: Run group lasso algorithm for the following loss function where c j = (c j,1 , . . . , c j,Lj +D+1 ) and R j is the kernel matrix whose (k 1 , k 2 ) element is E{B j,k1 (u) * B j,k2 (u)}. Denote the fitted coefficients asc j,k . Step 3 Adaptive group lasso for second step variable selection: Run adaptive group lasso algorithm for the updated loss function with weight as w k = ∞ ifc j R jcj = 0, otherwise w k = (c j R jcj ) −1/2 . Denote the fitted coefficients asĉ j,k and the selected variables are those withĉ j R jĉj = 0. For Steps 2 and 3, parameter λ 1 and λ 2 are chosen by minimizing BIC (5) for the fitted model, where the degree of freedom is computed with only the selected predictors. Simulation studies in Section 3.2 show that, with the knots selection Step 1, the algorithm can always choose the correct predictors with a reasonable number of samples. In this section, we compare the one-step and two-step adaptive spline fitting approaches we introduced with a commonly used kernel method package tvReg by Casas & Fernandez-Casal (2019) , using the simulation examples from Tang & Cheng (2012) . The model simulates a longitudinal data, which are frequently encountered in biomedical applications. In this example, the conditional random variable represents time and for convenience, we use notation t instead of u. We simulated n individuals, each having a scheduled time set {0, 1, . . . , 19} to generate observations. A scheduled time can be skipped with probability 0.6, so no observation will be generated at the skipped time. For each non-skipped scheduled time, the real observed time is the scheduled time plus a random disturbance from Unif(0, 1). Consider the time-dependent predictors X(t) = (x 1 (t), x 2 (t), x 3 (t), x 4 (t)) T with x 1 (t) = 1, x 2 (t) ∼ Bern(0.6), The response for individual i at observed time The random error i (t i,k ) are independent from the predictors, and generated with two components Random errors are positively correlated within the same individual and are independent between different individuals. Figure 1 shows the true coefficients and the fitted coefficients by the two-step adaptive spline and kernel methods for an example with n = 200. We observe that the fitted coefficients by the two-step method are smoother than those by the kernel method. We compare the three methods by the mean square errors of their estimated coefficients, i.e., where β j (t) andβ j (t) are the true and estimated coefficients, respectively, n i is the total number of observations for individual i and N = n i=1 n i . We run the simulation with n = 200 for 1000 repetitions. For adaptive spline methods we only consider knots from m √ N (m = 1, . . . , √ N ) quantiles of t i,k . Table 1 shows the average MSE j for the proposed one-step and two-step methods, in comparison with the kernel method. We find that the two-step method has the smallest MSE j for all four coefficients, significantly outperforming the other two methods. Interestingly, the one-step adaptive method selects on average 6.1 global knots for all predictors, whereas the two-step method selects fewer on average: 5.8 knots for x 1 (t), 4.2 for x 2 (t), 4.0 for x 3 (t) and 2.0 for x 4 (t). The result is expected since β 3 (t) and β 4 (t) are less volatile than β 1 (t) and β 2 (t). It appears that the two-step procedure can use a smaller number of knots to achieve a better fitting than the other two methods. We use the simulation example from Wei et al. (2011) to compare the performance of our method with one using adaptive group lasso and equidistant knots. Similar to the previous subsection, there are n individuals, each having a scheduled time set {0, 1, . . . , 29} to generate observations and a skipping probability of 0.6. For each non-skipped scheduled time, the real observed time is the scheduled time adding a random disturbance generated from Unif(0, 1). We construct p = 500 time-dependent predictors as follows: x 1 (t) ∼ Unif (0.05 + 0.1t, 2.05 + 0.1t) , , j = 2, . . . , 5 x 6 (t) ∼ N (3 exp{(t + 0.5)/30}, 1) , ∼ N (0, 4), j = 7, . . . , 500. The same individual's predictors x j (t) (j = 7, . . . , 500) are correlated with cor{x j (t), x j (s)} = exp(−|t − s|). The response for individual i at observed time . The time-varying coefficients β j (t) (j = 1, . . . , 6) are β 1 (t) = 15 + 20 sin{π(t + 0.5)/15}, β 2 (t) = 15 + 20 cos{π(t + 0.5)/15}, β 3 (t) = 2 − 3 sin{π(t − 24.5)/15}, β 4 (t) = 2 − 3 cos{π(t − 24.5)/15}, β 5 (t) = 6 − 0.2(t + 0.5) 2 , β 6 (t) = −4 + 5 * 10 −4 (19.5 − t) 3 . The random error i (t i,k ) is independent from the predictors and follows the same distribution as that in Section 3. We simulate cases with n = 50, 100, 200 and replicate each set for 200 times. Three metrics are considered: average number of selected variables, percentage of cases when there is no false negative, and percentage of cases when there is no false positive or negative. A comparison of our method with the variable selection method using equidistant knots (Wei et al., 2011) is summarized in Table 2 . We find that our method clearly outperforms the method with no predictorspecific knots selection, and can always select the correct predictors when n is larger than 100. The dataset we investigate contains daily measurements of meteorological data and air quality data in 7 counties of the state of New York between March 1, 2020, and September 30, 2021. The meteorological data were obtained from the National Oceanic and Atmospheric Administration Regional Climate Centers, Northeast Regional Climate Center at Cornell University: http://www.nrcc.cornell.edu. The daily data are based on the average of the hourly measurements of several stations in each county and composed of records of five meteorological components: temperature in Fahrenheit, dew point in Fahrenheit, wind speed in miles per hour, precipitation in inches, and humidity in percentage. The air quality data were from the Environmental Protection Agency: https://www.epa.gov. The data contain daily records of two major air quality components, the fine particles with an aerodynamic diameter of 2.5µm or less, i.e., PM 2.5 , in µg/m 3 and ozone in µg/m 3 . The objective of the study is to understand the association between the meteorological measurements, in conjunction with pollutant levels, and the number of infected cases for COVID-19, a contagious disease caused by severe acute respiratory syndrome coronavirus 2, and examine whether the association varies over time. The daily infected records were retrieved from the official website of the Department of Health, New York state: https://data.ny.gov. Figure 2 shows scatter plots of daily infected cases in New York County and the 7 environmental components over time. In order to remove the variation of recorded cases between weekdays and weekends, in the following study, we consider the weekly averaged infected cases which are the average between each day and the following 6 days. We also find that the temperature factor and dew point factor are highly correlated, and we remove the dew point factor when fitting the model. We first take the logarithmic transformation of the weekly averaged infected cases, denoted as y, and then fit a varying coefficient model with the following predictors: x 1 = 1 as intercept, x 2 as temperature, x 3 as wind speed, x 4 as precipitation, x 5 as humidity, x 6 as PM 2.5 and x 7 as ozone. Time t is the conditioner for our model. All predictors except the constant are normalized to make varying coefficients β j (t) comparable. The varying coefficient model has the form: where t i,k is the kth record time for the ith county, with y i (t i,k ) and x i,j (t i,k ) being the corresponding records for county i at time t i,k , and (t i,k ) iid ∼ N (0, σ 2 ) denotes the error term. We apply the proposed two-step adaptive spline fitting method to fit the data. Figure 3 shows the fitted β j (t) for each predictor. Figures show that there is a strong time effect on each coefficient function. For example, there are several peaks for the intercept, which correspond to the initial outbreak and delta variant outbreak. Moreover, there are rapid changes of coefficients around March 2020. This could be explained by the fact that at the beginning of the outbreak, the test cases were fewer and the number of infected cases was underestimated. Moreover, the coefficient curves show that the most important predictor is temperature. For most of the period, the coefficient is negative, indicating that high temperature is negatively associated with transmission of the virus, which is also observed in the study by ? such that COVID-19 spread is slower at high temperatures. Besides the negatively correlated relationship, our analysis also demonstrates the time-varying property of the coefficient. We note that environmental factors may not have an immediate effect on the number of recorded COVID-19 cases since there is 2-14 days of incubation period before the onset of symptoms and it takes another 2-7 days before a test result is available. To study whether there are lagging effects between the predictor and response variables, we fit a varying coefficient model for each time lag τ , i.e., using data y i (t i,k + τ ) and x i,j (t i,k ) (j = 1, . . . , 7), to fit model (8) similarly as in Fan & Zhang (1999) . Figure 4 shows the residual root mean squared error for each time lag. We find that the root mean squared error is the smallest with 4 days lag. Note that y i (t) represents the logarithm of the weekly average between day t and t + 6. This means the predictors at day t are most predictive for infected numbers between day t + 4 and t + 10, which is consistent with the incubation period and test duration. Figure 3 also show the lagged coefficients with τ = 4 in dashes, revealing nearly identical shapes with the non-lagged coefficients except the coefficient for PM 2.5 , which may indicate a large day-to-day variation of the PM 2.5 measurements. In this paper, we introduce two algorithms for fitting varying coefficient models with adaptive polynomial splines. The first algorithm selects knots globally using a recursive method, assuming the same set of knots for all the predictors, whereas the second algorithm allows each predictor to have its own set of knots and also uses a recursive method for individualized knots selection. Coefficients modeled by polynomial splines with a finite number of non-regularly positioned knots are more flexible as well as more interpretable than the standard ones with equidistant knot placements. Simulation studies show that both one-step and two-step algorithms outperform the commonly used kernel method in terms of mean squared errors, while the two-step algorithm achieves the best performance. A fast dynamic programming algorithm is introduced, with a computation complexity no more than O(n 2 ) and can be of order O(n) if the knots are chosen from m √ n (m = 1, . . . , √ n − 1) quantiles of the conditional variable u. In this paper, we assume that the variable u is univariate. The proposed two-step approach can be easily generalized to the case when each coefficient β j (u) has its own univariate conditioner variable u. However, it is still a challenging problem to generalize the proposed method to multi-dimensional conditioners and to model correlated errors. Pharmacokinetic parameters estimation using adaptive bayesian p-splines models tvreg: Time-varying coefficient linear regression for single and multi-equations in r Adaptive estimation for varying coefficient models Statistical estimation in varying coefficient models Simultaneous confidence bands and hypothesis testing in varying-coefficient models Bayesian spatially varying coefficient models in the spbayes r package Varying-coefficient models Estimation and identification of a varyingcoefficient additive model for locally stationary processes Functional coefficient regression models for nonlinear time series: A polynomial spline approach Varying-coefficient models and basis function approximations for the analysis of repeated measurements Polynomial spline estimation and inference for varying coefficient models with longitudinal data Global kernel estimator and test of varying-coefficient autoregressive model Estimating the dimension of a model. The annals of statistics Componentwise b-spline estimation for varying coefficient models with longitudinal data Penalized local polynomial regression for spatial data Generalized r-squared for detecting dependence Variable selection and estimation in highdimensional varying-coefficient models Model selection and estimation in regression with grouped variables Varying-coefficient additive models for functional data The adaptive lasso and its oracle properties We are grateful to the National Oceanic and Atmospheric Administration Regional Climate Centers, Northeast Regional Climate Center at Cornell University for kindly sharing the meteorological data. We are thankful to the Environmental Protection Agency and the Department of Health, New York State for publishing the air quality data and daily infected records online. The views expressed herein are the authors alone and are not necessarily the views of Two Sigma Investments, Limited Partnership, or any of its affiliates. We provide R implementation of varying coefficient model via adaptive spline fitting discussed in the paper.The R package is available at https://github.com/wangxf0106/vcmasf, which also provides a detailed instruction on how to use the software.