key: cord-1007692-fngofl2z authors: Dubey, Paromita; Chen, Yaqing; Gajardo, Álvaro; Bhattacharjee, Satarupa; Carroll, Cody; Zhou, Yidong; Chen, Han; Müller, Hans-Georg title: Learning delay dynamics for multivariate stochastic processes, with application to the prediction of the growth rate of COVID-19 cases in the United States() date: 2021-09-28 journal: J Math Anal Appl DOI: 10.1016/j.jmaa.2021.125677 sha: c5155c2b7f829bec7dbc28adb9a008c27315ca12 doc_id: 1007692 cord_uid: fngofl2z Delay differential equations form the underpinning of many complex dynamical systems. The forward problem of solving random differential equations with delay has received increasing attention in recent years. Motivated by the challenge to predict the COVID-19 caseload trajectories for individual states in the U.S., we target here the inverse problem. Given a sample of observed random trajectories obeying an unknown random differential equation model with delay, we use a functional data analysis framework to learn the model parameters that govern the underlying dynamics from the data. We show the existence and uniqueness of the analytical solutions of the population delay random differential equation model when one has discrete time delays in the functional concurrent regression model and also for a second scenario where one has a delay continuum or distributed delay. The latter involves a functional linear regression model with history index. The derivative of the process of interest is modeled using the process itself as predictor and also other functional predictors with predictor-specific delayed impacts. This dynamics learning approach is shown to be well suited to model the growth rate of COVID-19 for the states that are part of the U.S., by pooling information from the individual states, using the case process and concurrently observed economic and mobility data as predictors. Delay differential equations form the underpinning of many complex dynamical systems. The forward problem of solving random differential equations with delay has received increasing attention in recent years. Motivated by the challenge to predict the COVID-19 caseload trajectories for individual states in the U.S., we target here the inverse problem. Given a sample of observed random trajectories obeying an unknown random differential equation model with delay, we use a functional data analysis framework to learn the model parameters that govern the underlying dynamics from the data. We show the existence and uniqueness of the analytical solutions of the population delay random differential equation model when one has discrete time delays in the functional concurrent regression model and also for a second scenario where one has a delay continuum or distributed delay. The latter involves a functional linear regression model with history index. The derivative of the process of interest is modeled using the process itself as predictor and also other functional predictors with predictor-specific delayed impacts. This dynamics learning approach is shown to be well suited to model the growth rate of COVID- 19 for the states that are part of the U.S., by pooling information from the individual states, using the case process and concurrently observed economic and mobility data as predictors. © 2021 Elsevier Inc. All rights reserved. The modeling of time dynamical systems is of interest in multiple scientific fields. While ordinary differential equations (ODE) have a long history, interest in delay differential equations (DDE) is more recent, albeit both have been extensively studied [1, 3, 22, 26, 61] . A DDE is a natural extension of an ODE when observed processes have an aftereffect. Unlike the situation for ODEs, the solution of a DDE depends not only on the initial state but on the entire history of the process in a time interval of length equal to the delay prior to the initial time point. Moving beyond the basic notion of a deterministic ODE, random differential equations (RDE) [23, 46, 54, 55] and stochastic differential equations (SDE) [2] are used to accommodate probabilistic uncertainty in temporal stochastic processes. In this paper we focus on RDEs, where the random effects are manifested in the model parameters. These include coefficients, initial conditions and forcing terms that are typically smooth in time, leading to differentiable sample path solutions of the RDE, in contrast to the situation for SDEs, which have a non-differentiable forcing term, typically a Wiener process, and therefore require stochastic calculus [34] . Extensive developments in the theory of the forward problem of obtaining solutions for RDEs stand in contrast with statistical approaches, which include data-oriented methodology for the inverse problem, that is, learning the nature of the differential equation from data. This approach has been referred to as empirical dynamics [45] . The starting point is a sample of random trajectories that are viewed as independent and identically distributed (i.i.d.) realizations of an underlying smooth (continuously differentiable) stochastic process. For a smooth stochastic process X(t), there always exists a function f such that with E(Z(t)|X(t)) = 0 almost surely. This forms the basis of empirical dynamics and has led to both parametric and nonparametric modeling approaches for f [57, 66] , which can be characterized as dynamics learning from data. Due to the minimal assumptions, dynamics learning covers a large number of specific ODEs that do not need to be a priori specified, however it requires that a sample of repeated realizations of the underlying stochastic process is available. Such samples form the backbone of functional data analysis [48, 58] . A pertinent example is provided by COVID-19 caseload trajectories, where one observes samples of such trajectories across geographic units such as states or countries [18] . Related but quite distinct statistical inference concerns the problem of identifying the parameters of an a priori specified dynamic system [12, 20, 37] , usually just from one or very few realizations. Here the trajectories are considered non-stochastic but are typically assumed to have been measured with noise. This leads to a curve fitting problem that in simple cases can be addressed with nonlinear least squares, but for which also nonparametric and semiparametric statistical methods have been employed [38, 47] . A key difference is that in dynamics learning one requires data that can be considered as an independent and identically distributed sample of realizations of an underlying smooth stochastic process X i (t), i = 1, . . . , n, while the parameter identification problem is often addressed given (noisy) data from one trajectory. The modeling approach that we introduce here follows the paradigm of dynamics learning, albeit in a somewhat more structured framework than empirical dynamics. The general form of DDEs for vector functions is d dt ) is the function value or state at time t and x τ (t) can be either a vector of the states evaluated at discrete time delays x m (t − τ m )), or alternatively an integral of the trajectory x(·) over a past period, representing a continuum of delays often called distributed delays [4] [5] [6] 8, 15, 27, 31, 35, 41] . Results on the existence and uniqueness of random differential equations with delay (RDED) are relatively recent, where [14] studied conditions for L p existence and uniqueness of the solutions of a RDED and [24] the specific case of a linear RDED with a forcing term. In this paper we propose dynamics learning from a sample of multivariate functional data, where the model generating the observed derivative trajectories is assumed to be a RDE with a delay component. The randomness in the RDE is included in the forcing term, part of which is explained by additional covariates. These are stochastic processes with individual delay components, which in the motivating COVID-19 application correspond to trajectories of mobility and economic activity. The unexplained remainder is a drift process, which appears as Z(t) in equation (2). We utilize tools from the theory of RDE to provide a foundation for the proposed population models and establish that all the models described in Section 2 correspond to RDED with unique solutions, which depend on the model parameters. The goal is then to learn these unknown parameters, which govern the dynamics encapsulated in the population RDED, by pooling information across the sample of observed trajectories. The presence of a large number of covariates and delay components to choose from gives rise to the challenge of model selection. To address this we propose an initial pruning step as described in Section 4.2 followed by a backfitting step to optimize over a set of delay components as outlined in Section 4.3. The proposed approach differs from existing optimization strategies [41, 59, 60, 65] and statistical methodology [36] for parameter estimation that has been previously deployed to identify a DDE based on noisy data observed for a single non-stochastic trajectory. To our knowledge, dynamics learning where one has samples of stochastic processes has so far not been explored for the case of an underlying RDED, even for the case of a one-dimensional stochastic process. We furthermore demonstrate in this paper that such an approach is well suited for modeling the growth rates of COVID-19. We consider both discrete and distributed delay models and establish existence and uniqueness of the solutions in the L p sense [14] in Section 3. For the case of discrete delays we harness functional concurrent regression models [13, 33, 51, 53] and for distributed delays the functional history-index model [40, 52] , which incorporates a range of recent past values of the process. For dynamics learning of RDEDs, we adopt a two stage procedure. In a first step we utilize functional linear regression [17, 42, 63] with history index to learn the distributed delay, where the regression parameter function then corresponds to a history index function for the process of interest. In a second step the resulting linear predictor, which is the inner product of the history index function and the predictor process of interest, is used as a predictor along with additional covariate processes with covariate-specific delays in a concurrent model to fit the derivative process. We apply this model to predict the time-dynamic growth rate of COVID-19 for individual states in the United States, where the sample of case trajectories of all states provides the sample of trajectories that is the starting point for the proposed dynamics learning. Predictor processes that we consider include the cumulative case (caseload) process, daily economic indicators and changes in mobility patterns. Modeling the time evolution dynamics of the COVID-19 pandemic is of great importance to understand and interpret the underlying associations as well as for deploying resources and formulating policies in the face of great uncertainty [7, 11, 32] . The proposed methodology is also of interest to assess the dynamics of many other empirically observed complex multivariate stochastic processes for which samples of observed trajectories are available. We show by means of leave-one-out predictions that by employing dynamics learning for the proposed RDED model with delay components one obtains considerably more accurate time-dynamic growth rate predictions for COVID-19 caseload curves compared to models without delays, demonstrating the importance of considering the inclusion of lags when modeling the growth rate of COVID-19. Let (X(·), U(·)) denote a multivariate stochastic process where X(·) is a continuously differentiable process of interest, U(·) = (U 1 (·), . . . , U J (·)) is a vector function of additional covariates, and [t 0 , T ] is a time window of interest. Consider a RDED with discrete delays, where g corresponds to the initial condition stochastic process, the τ j , j = 0, . . . , J, are discrete delays, and α(t), β 0 (t), β j (t) are smooth functions whose regularity will be specified below in Section 3. In the above, Z(·) is a random drift process that is independent of (X(·), U(·)). While the inclusion of discrete delays is an extension of the classical functional concurrent regression model [48] , the functional linear regression model with history index [40, 52] might be a better choice when the derivative trajectories depend not only on the predictor process value at a single past instant but on the entire continuum in the recent past. We model these distributed delays as where g again is an initial condition process. For the purpose of illustration and technical derivations, we assume that U (·) is a univariate process in (4); the corresponding multivariate generalization is straightforward. For COVID-19 modeling a hybrid model that combines distributed delay in X(·) with discrete delays for all other predictors U j (·) turned out to be particularly suitable. In this practically relevant variant of a RDED one postulates that for all t ∈ [t 0 , T ], with an initial condition We remark here that the RDED models proposed above are linear in the model parameters, although general RDED models determined by nonlinear functions are possible and might provide greater modeling flexibility, at the cost of additional modeling complexity. For example, it is not difficult to develop population models for quadratic and polynomial versions, in analogy to functional polynomial models [62] . Modeling with linear RDEDs has the advantages that such models are easy to apply and implement, come with excellent interpretability of the model parameters, and exhibit good empirical performance in our application. For the rest of the manuscript we therefore only consider linear RDEDs and leave the development of nonlinear RDEDs for future research. Here we discuss the existence and uniqueness of the solutions of the RDEDs corresponding to the concurrent model (3) and the history-index functional linear model in (4) , as well as the hybrid model in (5) used in the data application. Consider a complete probability space (Ω, F, P ), where F ⊂ 2 Ω is the σ-algebra on Ω and P is a probability measure. Denote the space of random variables with finite p th moment by L p (p ≥ 1), that is, The notions of p th -moment differentiability and p th -moment Riemann integrability are defined similarly (see [54] Chapter 4, pages 92 and 100). We say that a stochastic process X : T → L p is a solution of an RDED in the L p sense on the interval T if X is p th -moment differentiable on I, p th -moment continuous on T := [t 0 − τ 0 , t 0 ] ∪ T , and X satisfies the RDED including the corresponding initial condition [14] . We require the following assumptions on the coefficient functions and predictor processes in model (3). Calatayud et al. [14] showed the existence and uniqueness of the solution of the following general form of a random delay differential equation with discrete delay τ > 0, given by and established the following result. [14] . If r satisfies the Lipschitz condition , and a k with T t 0 |k(t)|dt < ∞, then the random delay differential equation (6) has a unique solution in the L p sense. (3) is a special case of the random delay differential equation (6) with a specific choice of r. A key step is to show that the function f defined in (3) is p th -moment continuous. The existence and uniqueness of the solutions of (3) can then be obtained following similar arguments to those in the proof of Theorem 2.2 in [14] . We summarize this result in Proposition 1. All proofs are in the Appendix. , R) almost surely, then there exists a version of the L p -solution which solves the RDED in the sample path sense, see Theorem 2.3 in [14] and [16] . Note that in (3), we do not include the process X(t) itself as one of the predictors; included is only the delayed version X(t − τ 0 ), where τ 0 > 0. Hence solutions in the sample path sense do not require g to be differentiable almost surely. For τ 0 = 0, see Remark 3. The explicit sample path solution in the special case of a linear RDED has been derived in [14, 16, 24] . The linear RDED often includes random coefficients and a random time-varying forcing term. In (3), the terms involving the predictor processes U j (·), the drift process Z(·), and the intercept coefficient function α(·) form the constituents of the random forcing term; the varying coefficient functions α, β 0 , β j are non-random. Using the method of steps as outlined in the proof of Theorem 2.3 in [14] , we can provide a solution to (3) in the sample path sense as follows. For t ∈ [t 0 , t 0 + τ 0 ], (3) is equivalent to an ordinary RDE given by where f is defined as in (3). Then, the solution of (7) is given by Solving (8) leads to Repeating this argument until the entire domain T = [t 0 , T ] is covered one obtains the sample path solution. Due to the existence and uniqueness of the solution of (3) in the L p sense and by Remark 1, the sample path solution obtained as above is unique. Remark 3. In applications, the delay parameter can take the value zero and when τ 0 = 0 in (3) one arrives at an ordinary RDE without delay. For this case, existence and uniqueness of solutions has been well studied (see, e.g., Theorem 5.1.1 in [54] and [55] ). The sample path solution in this case can be obtained via the integration factor method and is given by (3). Here q(·) consists of the intercept term α(t), the predictor processes U j (t) along with the corresponding coefficient functions β j (t), j = 1, . . . , J, and the drift process Z(t). We start with the simplest history index model with only one predictor process U (·), characterized as a first order RDED described in (4) . We prove the existence of a unique solution of (4) by adopting similar arguments as those behind Proposition 2.1 and Theorem 2.2 of [14] . The extension to multivariate predictor processes U(·) is straightforward. Observe that one can rewrite model (4) as Here f is a random functional whose first argument is the trajectory The following proposition with proof in Appendix B provides the continuity of f in t in the L p sense. For the random process X on [t 0 − τ 0 , T ] and the time variable t ∈ [t 0 , T ], the history index model (9) can be expressed as For the existence of a unique solution of (11) we need the following assumptions, in addition to (A0)-(A2): and is p th -moment continuous on I . The following result characterizes the solution of (11), where the integral in condition (b) of the following Proposition is defined in the p th -moment Riemann integral sense [for definition of p th -moment Riemann integrability see [54] , Chapter 4 page 100]. Proposition 3. The process X : T → L p is a L p solution of (11) if and only if for τ 0 > 0, The following result establishes the existence and uniqueness of a solution for (11) . The proof extends the classical Picard theorem for deterministic ODE to RDED in the L p sense, via the Banach fixed-point theorem, following a similar line of arguments as in [14] (see Appendix B). and a solution of (4) exists and is unique in the L p sense. For the existence and uniqueness of the solution of the hybrid model in (5), which is the most promising of the models considered for modeling the growth rate of COVID-19, the following corollary applies. In data-driven differential equation modeling, given a sample of observed processes, a necessary first step is the estimation of derivatives. In applications such as the COVID-19 case trajectories the available data are usually noise-contaminated, and therefore some care is needed to obtain viable derivative estimates. The situation for the observed data for one of the trajectories is reflected by the nonparametric regression model where {t k } K k=1 are the grid points where the process X is measured and k are the measurement errors, which typically are considered to be independent mean zero random variables with finite variance. In our data application, the available data Y k correspond to the cumulative case counts of COVID-19, which we consider to be noisy realizations of a smooth underlying function X(·) that is observed on a daily grid. For the practical estimation of the derivatives X (t) one has various choices that all require a tuning parameter. Since in preliminary studies local polynomial fitting yielded the smallest leave-one-out prediction error relative to the difference quotients, which serve as nearly unbiased targets, we opted to obtain derivatives by applying local polynomial regression [28, 44] as per Appendix C, where further details are provided. For modeling the COVID-19 caseload process X for the states of the U.S. we adopt the RDED model (5), where the time delay of the effect of the process X at time t ∈ T is distributed on the interval [t − τ 0 , t], and {τ j } J j=1 are discrete time delays or time lags for the covariate or predictor processes U j . Our starting point is that the data {(X i , U i1 , . . . , U iJ )} n i=1 represent n independent realizations of the stochastic process (X, U 1 , . . . , U J ), where each realization (corresponding to a state) is indexed by i and n is the number of states included in the analysis. Given τ 0 , to estimate the history index weight function γ(s, t) in the historical model without covariates, we preform a function-to-scalar linear regression for each grid point t k , i.e., This can be implemented using function FLM in the R package fdapace [19] . With estimates of γ(s, t k ) for each k in hand, the estimate of the history index weight function, γ(s, t) for all s, t ∈ T , is obtained using a two dimensional kernel smoother. For each of the covariates U j , we conduct an initial lag selection based on functional varying-coefficient regression including the corresponding variable as a single predictor, considering the models where U 0 := X. Given a lag τ , for each subject i and time point t k , we obtain the leave-one-out prediction where α j,−i (t k ) and β * j,−i (t k ) are estimated by fitting model (14) by ordinary least squares, excluding the data from subject i. The lags are chosen to minimize the leave-one-out prediction error across subjects, With initial choices of the lags { τ j } J j=1 and the estimate γ obtained from model (13), for each t k , variable selection is then performed using LASSO [56] , where β k = (β 0k , . . . , β Jk ). This can be implemented using the R package ncvreg [9, 10] , where the tuning parameter λ is chosen by leave-one-out cross-validation. Considering all K time points simultaneously, we obtain proportions of the number of time points t k where each variable U j is selected, i.e., p j = #{k :β j (t k ) = 0}/K, for j = 1, . . . , J. Subsequently, selection among the time-varying predictors {U j } J j=1 other than the process X is performed by applying a threshold on these proportions {p j } J j=1 . This threshold p * is chosen by leave-one-out crossvalidation, Let V denote the set of selected predictor processes obtained using the method described in Section 4.2. Updated lag parameters are then chosen for the variables in V through an iterative backfitting algorithm such that the total leave-one-out prediction error is minimized. Given the b th iteration of the lag vector, one obtains the (b + 1) st update for the k th lag by minimizing the leave-one-out integrated mean squared prediction error, i.e., Here the functions β (b) k,−i (t) represent the fitted coefficients corresponding to the b th iteration of the j th lag, j ∈ V with the i th observation held out for prediction. This process is iterated over k ∈ V until the lag vector converges. Lags are initialized at the vector τ (0) = ( τ 1 , . . . , τ j , . . . , τ |V| ) T , where τ j are as described in Section 4.2 and the stopping rule is such that the algorithm terminates if no lags have changed after a complete cycle of updates for all predictors. After the final lags are estimated, we lightly smooth the estimated coefficients with local linear smoothing, which is a standard procedure in concurrent functional regression modeling [25, 29] . Specifically, the initial estimates β j (t k ) are smoothed to obtain the effect functions presented in Figs. 3 and 4, defined as βsmooth with K(·) representing the Epanechnikov kernel with a bandwidth of h = 20 days. We obtained daily confirmed cases across states in the United States from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, which is publicly available at https://github .com /CSSEGISandData /COVID -19 and was accessed on September 19, 2020. The positive and negative COVID-19 test results per day and state were obtained from the COVID Tracking Project and are publicly available at https://COVIDtracking .com/ (accessed on September 19, 2020). The former variable refers to the number of people with confirmed or probable case of COVID-19 (see https:// COVIDtracking .com /about -data /data -definitions for more details) and the latter refers to the total number of people with a completed negative PCR test. We used effective positivity rates (EPR), defined as the ratio of effective positive results (positive test or probable case) to the effective total tests (positive or probable case plus negative test results). net revenue for small businesses among middle(middle two quartiles) income ZIP codes 33 net revenue for small businesses in transportation 34 net revenue for small businesses in education and health services 35 net revenue for small businesses in leisure and hospitality We also obtained features from the Opportunity Insights Economic Tracker Data [21] , which is publicly available at https://github .com /OpportunityInsights /EconomicTracker (accessed on September 19, 2020). This database contains daily and state level information for several economic activity indicators such as credit and debit card expenditure across different types of activities, some of them stratified by zip codes with low/medium/high income level. We refer to [21] for more details. Additionally, we downloaded Google mobility data, publicly available at https://www .google .com /COVID19 /mobility/ (accessed on September 19, 2020). These data include indicators of mobility pattern changes in different areas such as parks, residential locations, retail locations, among others; and information such as the percent change in revenue as well as total number of small businesses for high/low income consumers and across different economic activities; see Table 1 for a complete listing. We briefly describe a few features below and refer to the Economic Tracker source [21] for further details: • 'spend all': Seasonally adjusted credit and debit card spending relative to the period Jan 4 to Jan 31, 2020, in all merchant categories, reported as a seven day moving average formed by using the values for the current and the previous six days. • 'spend tws': Seasonally adjusted credit and debit card spending relative to the period Jan 4 to Jan 31, 2020, in the category transportation and warehousing, and also reported as a seven day moving average. • 'spend all inchigh': Seasonally adjusted credit and debit card spending by consumers living in high median income (above 75% quantile) zip codes, and relative to the period Jan 4 to Jan 31, 2020, in all merchant categories, again reported as a seven day moving average. • 'revenue all': Seasonally adjusted percent change in net revenue for small businesses and indexed to the period Jan 4 to Jan 31, 2020, reported as a seven day moving average. • 'revenue ss70': Seasonally adjusted percent change in net revenue for small businesses in leisure and hospitality and indexed to the period Jan 4 to Jan 31, 2020, reported as a seven day moving average. • 'merchant inchigh': Seasonally adjusted percent change in the number of small businesses open in high median income (above 75% quantile) zip codes and indexed to the period Jan 4 to Jan 31, 2020, reported as a seven day moving average. • 'gps parks': Corresponds to the time spent at parks relative to a baseline which is defined as the median value, for the corresponding day of the week, during the period Jan 3-Feb 6, 2020. We employ local linear regression to slightly smooth the feature trajectories; this also allows to impute missing observations for some states and days. For this step, a Gaussian kernel with bandwidth 1.5 days was employed and we utilized the R package fdapace version 0.5.5 for the computational implementation [19] . The effective test positivity rate was smoothed analogously. We use local quadratic regression [28] to obtain the derivatives X (t), implemented by using the R package locfit version 1.5-9.4 [39] . For data analysis we consider states with population at least 1 million as the states with smaller populations had less COVID-19 cases, and their inclusion would therefore increase the overall noise level in the data and negatively impact the analysis. This is a pragmatic choice and is not a limitation of the proposed method. In the end, we included the data from 44 states (excluding Alaska, Delaware, North and South Dakota, Vermont and Wyoming). As a result of the LASSO variable selection described in Section 4.3, which was implemented with the R package ncvreg version 3.12.0 [9] with threshold p * = 0.3 in (16), the features utilized in the final model are as follows, where the selected lags are listed in Table 2. 1. X i (t), the total cases per million at time t for state i, i = 1, . . . , 44. 2. U 1i (t), the relative time spent at park areas at time t in state i. 3. U 2i (t), the number of hospitalized people at time t in state i. 4. U 3i (t), the "positive testing rate" defined as the ratio of effective positive results (positive test or probable case) to the effective total tests (positive or probable case plus negative test results) at time t for state i. 5. U 4i (t), the credit/debit card relative spending in the arts, entertainment, and recreation category (see 'spend aer' predictor in Table 1 ) at time t in state i. In order to study the delay time dynamics of COVID-19 in the United States, we focus on the statespecific time-varying cumulative confirmed cases per million people, X(t), for t ∈ T , which is referred to as the case process in the following. Here, the time domain T that we consider starts on April 5 and ends on August 12 in 2020, where data are recorded once per day, on an equidistant grid for 130 days. We estimate the history coefficient function γ in the RDED model (5) as described in Section 4.2 with τ 0 = 14, and adopt the functional varying-coefficient regression in model (5) with J = 34 other variables as listed in Table 1 . The estimated coefficient surface γ(s, t) and some of its cross-sections are displayed in Fig. 1 . For each fixed t, γ(·, t), the estimated history index in model (5) is found to change from positive near current time to negative with increasing lags. This means that the recent past of the process has a positive association with the current derivative while days further into the past are negatively associated. The right panel shows that the shape of γ(·, t) changes from more or less quadratic to linear from the early days of COVID-19 to recent times, indicating that in April 2020 the contrast between the effects of near-current and past caseload on the current derivative was more pronounced than it is more recently. The results are visualized in Fig. 2 . States which are suffering from faster spread of the virus are characterized by higher growth rates. The historically weighted integrated case count per million has a positive effect on the growth rate. This reflects dynamic explosive behavior [45] , so that states will tend to have caseloads that move even further away from the mean caseload per million across states as time moves forward, which could be either moving further above the average across states or further below it. Thus the positive effect of the history index reflects increasing variability in the caseloads across states. Potential reasons for this are the distinct policies that states implement in terms of mobility, business and social restrictions, as well as cultural acceptance of such restrictions. Spatial infectivity dynamics with waves of infections moving spatially across the U.S. also might play a major role. In terms of the effects of covariate processes, our results indicate that higher park mobility during late spring and summer (May onwards) is associated with a decrease in growth rates. For each percent increase in park mobility, the growth rate decreased by as much as 10 cases per million per day, with the effect reaching its maximum in mid-summer. This indicates negligible risk and potential benefits from outdoors recreation activities. The other covariate processes that were selected by LASSO as predictors in the RDED model (5) are patient counts at hospitals, the testing positivity rate, and credit card spending on recreational activities, including dining at restaurants. All of these were found to have positive effects on the growth rate throughout nearly the entire time domain. These effects were included in model (5) with covariate process-specific lags, as per Table 2 . The number currently hospitalized was found to be most predictive for the case growth rate when including a two-week lag, whereas for test positivity rate a one-week lag was found to be most predictive. This may suggest that individuals who are infected by individuals with less severe symptoms (i.e. a non-hospitalized COVID-positive person) tend to learn of their disease one week after their infector did. The two-week difference for hospitalized individuals is likely a reflection of the time needed to develop serious symptoms which warrant admission for medical care. The effect of recreational spending on viral spread was much more immediate; the optimal lag in this case is only 3 days. This suggests that states with higher recreational economies during the pandemic see an almost immediate uptick in growth rate, with cases increasing by as much as 25 infections per million per day for each percent of additional recreational spending. Figs. 3 and 4 display the fitted growth rates provided by the RDED model (5) , with covariate specific lags, and also the results of fitting a differential equation model where the covariates do not have individual lags, i.e. a concurrent RDE model given by (5) in which τ j = 0, j = 0, 1, 2, 3, 4. Fitted curves are the result of leave-one-out predictions for each state, where the data from the left-out state were not used for fitting the model and then the state's data are plugged into the fitted model to obtain the predicted case growth rate. These predictions are compared against the actual observed growth rates as obtained by local polynomial regression (see Section 4.1 and Appendix C). Visual inspection (as well as measures of goodness-of-fit, such as integrated mean squared prediction error) indicates that the incorporation of delays with predictor specific lags in the model substantially improves the fits. This provides compelling evidence for modeling COVID-19 growth rates with RDEDs that include covariate-specific lags. A vanilla RDE without predictor delays does not incorporate historical information beyond the previous day and struggles to explain deviations in phase (as in, e.g. Illinois). The problem of modeling phase variation is instead foisted upon the coefficient functions, which are ill-equipped to reflect it, and thus at times the straight RDE model without predictor delays falters in terms of amplitude modeling as well (e.g. for Colorado). In contrast, the proposed RDED incorporates past information through the included delays and thus reflects phase variability in a natural way, especially through the inclusion of distributed delays via the history index. Including a two-week history of case counts through the history index function γ in model (5) substantially enhances the flexibility of the model. The estimates of coefficient functions are no longer uncontrollably altered by the effects of unaccounted lags, and as a result the fitted model performs much better. A comparison of observed vs. RDED model-predicted growth rates allows one to assess the performance of states in terms of controlling spread. A state whose observed growth rate falls below the predicted curve corresponds to a negative residual and suggests better-than-expected viral control, and vice versa for observed rates which fall above the predicted curve. Figs. 5 and 6 display the residual curves for the fitted RDED model (5) , where positive residuals correspond to the number of excess new cases per day relative to the predictions obtained from fitting model (5), as described above, including the two-week history index and other covariate processes with their respective lags. These residuals reflect the unexplained residual stochastic process Z that is an integral and principally unpredictable part of the RDED model (5) on which our predictions are based. Major outbreaks, which can be interpreted as an unforeseen amount of excess cases, are revealed through positive spikes in the residual plots. For example, New York's residuals leap up to as high as 170 excess cases per million per day during a spike in Spring. Arizona's summer outbreak is captured in the residuals as well, subsequently course-correcting with a streak of fewer-than-expected new cases around day 100. These spikes also suggest that in the two weeks prior to the uptick, there were substantially more cases in the state population than the numbers reported in the JHU dataset. The consistency of a state's handling of the virus can be seen through the volatility of the residual curve. States like Colorado and Oregon exhibit only minor deviations from zero, which suggests their new cases are quite predictable given a two-week history of the state case counts and covariate processes. Some states, particularly in the Northeast, including New York, New Jersey, and Rhode Island, only exhibit high volatility during the early days of the pandemic before achieving a period of stability. On the other hand, states such as Alabama and Louisiana exhibit less-predictable case counts throughout. The codes for reproducing the above analysis are available at https://osf .io /w48zd /?view _only = 7cd3e2c686a74fd086e4eae8534214c0. The run times of the different steps in the analysis using R version 3.6.3 (2020-02-29) running under CentOS Linux 7 (Core) on x86_64-pc-linux-gnu (64-bit) platform are summarized in Table 3 . In this paper we propose dynamics learning exemplified by learning a RDED from a sample of observed trajectories, which is motivated by modeling COVID-19 daily new cases from the previous behavior of the case trajectory itself through a distributed delay or history index component and additional covariate Table 3 Run time for the different steps in the data analysis. Step process components. RDEDs form the driving mechanism of real life processes that include a feedback loop into the future. Depending on the application and the exact nature of the feedback, one might incorporate discrete delays or distributed delays. While discrete delays borrow information from some isolated times in the past, a distributed delay takes into account the entire continuum in the recent past and turns out to be closely related to a functional history index model that has been considered in the statistics literature. In the current framework, we have not included inference for the estimated model coefficient functions and the lags. It is left for future research to investigate the convergence of these estimated parameters to their population targets and to obtain guarantees on the convergence rates. Some words of caution are in order. Any method that includes a time delay component suffers from the limitation that a larger set of possible lags means one has to sacrifice a part of the data that are available for the analysis, due to the initialization. We have assumed that the trajectories corresponding to the different states are i.i.d. realizations, which may not hold if there is significant spatial correlation based on geographic and demographic similarities across the states. It would be interesting to study adaptations of the proposed methodology when the samples are correlated instead of being independent. From an application point of view, a different and perhaps more refined perspective could potentially be gained by applying the proposed method at a finer level to county level data. Establishing the existence and uniqueness of the solutions of RDEDs has found increasing interest in recent years [14, 24] . We contribute here to the forward problem of obtaining solutions of an RDED by targeting the inverse problem analytically and combine this with functional data analysis approaches to estimate the parameter functions from an observed sample of multivariate stochastic processes for both discrete and distributed delay setups. Our approach builds on and contributes to empirical dynamics and the literature on inferring dynamics and differential equations from functional data, and extends it to models that include time delays. A key feature of the proposed approach is the statistical estimation of the model parameters in the population RDED, which has inbuilt model selection steps that aid in estimating the optimal lags and also to select the important predictors through the lasso pruning steps, with majority voting across all time points. This reduces the number of predictors in the model and thus its complexity prior to the optimization step of the lags that are associated with the predictors. To handle this complex lag optimization step, instead of a brute force search method, we adopted a backfitting algorithm, which significantly simplifies the optimization step and also leads to faster convergence. We provide a complete toolkit that starts with a sample of multivariate trajectories and extracts from the observed data an estimated RDED that best explains the dynamics inherent in the data, along with automatic variable selection and delay estimation. The application of our method to the modeling of COVID 19 growth rates in the United States demonstrates excellent performance in terms of predictions of the trajectories. The predictions from the random differential equation model without predictor lags grossly overestimates the COVID-19 growth rates for significant periods of time in New York, New Jersey, Pennsylvania, Massachusetts, Connecticut, Virginia, West Virginia, New Hampshire, California, Texas, Colorado, Oregon, Washington, Maine and Hawaii but underestimates the growth rates for Alabama, Arizona, Arkansas, Florida, Louisiana, Nebraska, Nevada, South Carolina, Tennessee and Wisconsin; see Figs. 3 and 4. This highlights the importance of incorporating time lags in differential equation modeling of functional data, when one has a sample of realized stochastic processes, specially when the underlying processes tend to have an aftereffect. Appendix A. Proof of the existence of a solution for the model in (3) Proof of continuity of f . The unique solution of the first order DDE in (3) can be shown to exist as a direct extension of Theorem 2.2 of [14] . We first show that f (y, t) is p th -moment continuous, using Assumptions (A0) -(A2) in Section 3.1. Observe that (E|f (y m , t m ) − f (y, t)| p ) 1/p = (E|α(t m ) − α(t) + β 0 (t m )y m − β 0 (t)y + Z(t m ) − Z(t) (A.1) The first and third terms in (A.1) converge to 0 using Assumptions (A0) and (A1) respectively. For the second and fourth terms of (A.1), ||β 0 (t m )y m − β 0 (t)y|| p ≤ ||β 0 (t m ) y m − β 0 (t m ) y|| p + ||β 0 (t m ) y − β 0 (t) y|| p ≤ |β 0 (t m )| ||y m − y|| p + |β 0 (t m ) − β 0 (t)| ||y|| p . (A.2) By Assumption (A0), β 0 (t m ) is continuous on the compact domain T , and hence uniformly bounded. Also, by assumption y m → y, in the p th -moment. Since L p , the space of random variables with finite p th moments, is a Banach space, it follows that ||y|| p := (E|y| p ) 1/p < ∞. Thus (A.2) converges to 0. Next The whole curve x (ν) (·) is obtained by running the above local polynomial regression with t varying in an appropriate estimation domain. For local polynomial fitting L −ν preferably is taken to be odd as shown in Ruppert & Wand [50] and Fan & Gijbels [28]. To obtain an estimate of the first derivative, i.e., for ν = 1, the choice L = ν + 1 = 2 leads to the so-called local quadratic regression and the derivative estimate x (t) is given by the local slope θ 1 . Other common methods for derivative estimation can be based on smoothing splines or B-splines [49,64]. An alternative method is based on difference quotients, which provides a straightforward approach for pointwise estimation of derivatives Nonlinear differential-difference equations Stochastic Differential Equations Analysis of a system of linear delay differential equations Numerical solution of a class of delay differential and delay partial differential equations via Haar wavelet Numerical Methods for Delay Differential Equations Global asymptotic stability of an SIR epidemic model with distributed time delay The challenges of modeling and forecasting the spread of COVID-19 A review of time-delay estimation techniques ncvreg: regularization paths for SCAD and MCP penalized regression models Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection Transmission dynamics reveal the impracticality of COVID-19 herd immunity strategies Parameter estimation of ODE's via nonparametric estimators Efficient estimation and inferences for varying-coefficient models Random differential equations with discrete delay On a predator prey model with nonlinear harvesting and distributed delay Applying the random variable transformation method to solve a class of random linear differential equation with discrete delay Functional linear model fdapace: functional data analysis and empirical dynamics Efficient local estimation for time-varying coefficients in deterministic dynamic models with applications to HIV-1 dynamics The Opportunity Insights Team, The economic impacts of COVID-19: evidence from a new public database built using private sector data Theory of Ordinary Differential Equations Numerical solution of random differential equations: a mean square approach L p -solution to the random linear delay differential equation with a stochastic forcing term Generalized varying coefficient models for longitudinal data Ordinary and Delay Differential Equations Recursive estimation for system of unknown delay Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability Two-step estimation of functional linear models with applications to longitudinal data Velocity and acceleration of height growth using kernel estimation Nicholson's blowflies revisited Dynamic modeling of multivariate longitudinal data Polynomial spline estimation and inference for varying coefficient models with longitudinal data On a formula concerning stochastic differentials Discrete time techniques for time delay estimation Modeling CD4+ T cells dynamics in HIV-infected patients receiving repeated cycles of exogenous Interleukin Parameter estimation of ordinary differential equations Parameter estimation for differential equation models using a framework of measurement error in regression models locfit: local regression, likelihood and density estimation. R package version 1.5-9 The historical functional linear model Parameter estimation of delay differential equations: an integration-free LS-SVM approach Functional regression Weighted local regression and kernel methods for nonparametric curve fitting Bandwidth choice and confidence intervals for derivatives of noisy data Empirical dynamics for longitudinal data Random Differential Equations in Scientific Computing Semiparametric modeling of autonomous nonlinear dynamical systems with application to plant growth Functional Data Analysis Smoothing splines: regression, derivatives and deconvolution Multivariate locally weighted least squares regression Generalized varying coefficient models for longitudinal data Functional varying coefficient models for longitudinal data Varying coefficient models for sparse noise-contaminated longitudinal data Random Differential Equations in Science and Engineering Random ordinary differential equations Regression shrinkage and selection via the lasso Inferring stochastic dynamics from functional data Functional data analysis Estimating parameters in delay differential equation models Adaptive semiparametric Bayesian differential equations via sequential Monte Carlo On the controllability of delay-differential systems Functional quadratic regression Functional linear regression analysis for longitudinal data On derivative estimation in spline regression Statistical Inference of Distributed Delay Differential Equations Semiparametric stochastic modeling of the rate function in longitudinal studies Appendix B. Proof of the existence of a solution for the model in (4) For ease of presentation, we will begin by showing the existence of the unique solution of the first order RDED (4) with one predictor process U (·), rewritten as in (11) . The arguments can then be easily extended to the case of multiple predictor processes, and we omit the details.Proof of Proposition 2. Recall that t m ∈ [t 0 , b] is a real sequence converging to t ∈ [t 0 , b] and Y is an L p -continuous process, so that(B.1)Using Assumptions (A0) and (A1), respectively, the first and second term of (B.1) converge to 0. As before, we consider the third and fourth terms separately,For the first term in (B.2),where the first inequality follows from the fact that the integrand is p th moment continuous, using Assumption (B0) and the assumption that Y (·) ∈ L p (see [54] page 102 (3)). The second inequality follows using the properties of L p Riemann integration. Now, from Assumption (B0), we have that for any > 0 there existsFrom [54] page 103 (5), each term in the RHS of the above equation is p th -moment continuous. Hence, by, and the second term in (B.2) also converges to zero as m → ∞. As for the convergence of the remaining fourth term, involving the predictor process U (·) in (B.1), we can follow a similar line of argument to obtainThe p th -moment continuity of the process f (Y, ·) follows by definition.As a consequence of the above result, we observe that f (X, t) in (11) is continuous in t ∈ T = [t 0 , T ] in the L p sense, by simply choosing b = T .Proof of Proposition 3. Suppose X is an L p solution of (11) . Then by definition, (a) and (c) hold. Since X is p th -moment continuous (see Assumption (B1)) and the process f (X, ·) is L p -continuous on T = [t 0 , T ], it follows from [54] page 101(1) that X is p th -moment continuous and p th -moment Riemann integrable on T . Thus from the Fundamental Theorem of L p Calculus ( [54] page 104(6))(X, s)ds, and condition (b) holds. On the other hand, if conditions (a)-(c) are true, X is indeed an L p solution of (11), since the process f (X, ·) was shown to be continuous. Thus f (X, ·) is p th -moment Riemann integrable on T ([54] page 101 (1)), and t t 0f (X, s)ds is well defined in the L p sense. From the result on page 103 (5) of [54] , it follows that X is p th -moment differentiable on T , with X (t) =f (X, t). Hence the proposition follows.Proof of Theorem 1. From the history index model in (11) − s, t) |ds, since γ is continuous on a compact domain, hence uniformly bounded, we have k ∈ L 1 (T ). Following the arguments outlined in the proof of Theorem 2.2 in [14] , this implies lim t→t + 0 t t 0 k(s)ds = 0. Thus, we can choose α > t 0 such that, for all t ∈ T α = [t 0 , α] ⊂ T , t t 0 k(s)ds ≤ 1/2. As shown in the proof of Theorem 2.2 of [14] , A is a Banach space. Consider the map Λ : A → A such thatIf X ∈ A, it follows by taking b = α in Proposition 2 that the process f (X, t) is p th -moment continuous over t ∈ T α . From arguments on p. 103 (5) of [54] , the p th -moment Riemann integral (L p ) t t 0f (X, s)ds is continuous. Also, the initial condition g(·) is continuous on. By definition ΛX ∈ A, so Λ is well defined.Since from Proposition 3, X : T α → L p is a solution of equation (11) if and only if X ∈ A and ΛX = X, by the Banach fixed-point theorem, it suffices to check that Λ is a contraction. Let X, Y ∈ A and t ∈ T α and observeProof of Corollary 1. Similar to (10) , define h :Here h is a random functional with first argument given by trajectories {Y (s) : s ∈ [t 0 − τ 0 , b]} and second argument t ∈ [t 0 , b]. Note that h constitutes a distributional delay term on Y and a discrete concurrent delay on the predictors U j . ThusBy similar arguments as in the proof of the continuity of f in Appendix A and the proof of Proposition 2 in Appendix B, the continuity of h in the L p sense in t follows. The existence and uniqueness follows by the same arguments as in the proof of Theorem 1. For derivative estimation, we employ the local polynomial estimator for derivatives, motivated by local polynomial approximation [28, 43] , which leads to the weighted least squares estimates θ l that correspond to solvingwhere K is a kernel function with K h (·) = K(·/h)/h and h is a tuning parameter. A well-studied estimator for the ν-th order derivative x (ν) (t) is given byx