key: cord-0528579-50b6flxg authors: Hartl, Tobias title: Monitoring the pandemic: A fractional filter for the COVID-19 contact rate date: 2021-02-19 journal: nan DOI: nan sha: d7f5f0ed37ed13bf3059813d8d5312be12b60cd4 doc_id: 528579 cord_uid: 50b6flxg This paper aims to provide reliable estimates for the COVID-19 contact rate of a Susceptible-Infected-Recovered (SIR) model. From observable data on confirmed, recovered, and deceased cases, a noisy measurement for the contact rate can be constructed. To filter out measurement errors and seasonality, a novel unobserved components (UC) model is set up. It specifies the log contact rate as a latent, fractionally integrated process of unknown integration order. The fractional specification reflects key characteristics of aggregate social behavior such as strong persistence and gradual adjustments to new information. A computationally simple modification of the Kalman filter is introduced and is termed the fractional filter. It allows to estimate UC models with richer long-run dynamics, and provides a closed-form expression for the prediction error of UC models. Based on the latter, a conditional-sum-of-squares (CSS) estimator for the model parameters is set up that is shown to be consistent and asymptotically normally distributed. The resulting contact rate estimates for several countries are well in line with the chronology of the pandemic, and allow to identify different contact regimes generated by policy interventions. As the fractional filter is shown to provide precise contact rate estimates at the end of the sample, it bears great potential for monitoring the pandemic in real time. Since the outbreak of COVID-19 reducing social contacts is widely viewed as the key way to contain the spread of the virus. In terms of the Susceptible-Infected-Recovered (SIR) model 1 , this relates to the contact rate, defined as the average number of contacts per person per time unit multiplied by the probability of disease transmission between a susceptible and an infected individual (Hethcote; . The probability of disease transmission should only depend on characteristics that are specific to the virus. Therefore, the contact rate can be interpreted as a proxy for aggregate social behavior and is the key variable addressed by social distancing measures. Knowing the trajectory of the contact rate would allow to draw inference on the impact of policy measures on contact reduction, to real-time monitor the dynamics of virus dispersion, and to design policy rules based on the current pandemic situation. Since the contact rate itself is unobservable, appropriate methods to estimate the contact rate are required, and will be considered in this paper. At the early stage of the pandemic, first estimates for the natural logarithm of the contact rate were obtained by fitting a deterministic, linear time trend with structural breaks to transformations of data on confirmed, recovered, and deceased cases (Hartl, Wälde and Weber; Lee et al.; Liu et al.; . Modeling the log contact rate by a piece-wise linear time trend was a reasonable and pragmatic approximation given the short time series on case numbers available at that time. However, it implies that contact rate growth evolves deterministically as a straight line with jumps at the break dates. This assumption is likely to be violated by the behavior of individuals. While structural breaks may be suitable to identify turning points of the contact rate, they are inappropriate for monitoring the current pandemic situation, as breaks require at least some post-break observations to be well identified. This paper aims to improve estimates for the contact rate of COVID-19 by taking into account key features of aggregate social behavior. In detail, the log contact rate, as denoted by log β t , is modeled as an unobserved, fractionally integrated process of (unknown) order d ∈ R + , generated by stochastic shocks {η i } t i=1 . 2 The stochastic specification of the contact rate is motivated by the consideration that social decisions, e.g. on whether to meet, are made conditional on the information available at that time, e.g. on current social distancing measures or the state of the pandemic. As information does not evolve deterministically but appears as stochastic shocks, this suggests to treat log β t as a stochastic process generated by the information shocks {η i } t i=1 . Specifying log β t as a fractionally integrated process accounts for strong persistence and nonstationarity (in short: long memory) of social behavior. In contrast to structural breaks but also to random walks, the fractional specification allows social behavior to gradually adjust to new information both at the individual and at the aggregate level. Individually, this reflects a gradual reduction or increase of contacts as new information becomes available (e.g. as new contact restrictions are imposed), while on aggregate it allows individuals to react heterogeneously both in terms of speed and intensity to novel information. As the persistence of the log contact rate is unknown, the integration order d is treated as an unknown parameter to be estimated. Methodologically, this paper contributes to the literature on time series filtering by setting up a novel unobserved components (UC) model that does not require prior knowledge about the integration order of the variable under study. Current UC models and related filtering techniques rely heavily on prior assumptions about the integration order d and typically assume d = 1 (e.g. Harvey; 1985; Morley et al.; 2003; Chang et al.; 2009) or d = 2 (e.g. Clark; 1987; Hodrick and Prescott; 1997; Oh et al.; 2008) to be known. In contrast, the novel UC model reflects that the degree of persistence of the log contact rate is unknown. It allows to decompose a noisy measurement for the log contact rate that is based on a transformation of data on confirmed, recovered, and deceased cases, into measurement errors, seasonal components, and the unobserved log contact rate itself. As the latter is modeled by a fractionally integrated process, the model is called the fractional UC model. The second methodological contribution of this paper is to derive a computationally much simpler estimator for the model parameters and the unobserved components compared to current state space methods. Current methods typically rely on the Kalman filter to set up a conditional (quasi-)likelihood function for the estimation of the model parameters. Given the parameter estimates, a time-varying signal for the unobserved components is then obtained from the Kalman smoother. Both the Kalman filter and smoother become computationally infeasible when the dimension of the state vector of UC models is high, as for fractionally integrated processes. To address this problem, this paper proposes a computationally simple modification of the Kalman filter and smoother that is termed the fractional filter. While filtered and smoothed estimates from the fractional filter are identical to the Kalman filter and smoother, the fractional filter avoids the computationally intensive recursions for the conditional variance. The fractional filter provides a closed-form expression for the prediction error of UC models, based on which a conditional-sum-of-squares (CSS) estimator for the fractional integration order and other model parameters is set up. While the CSS estimator has been found useful for the estimation of ARFIMA models, see Hualde and Robinson (2011) and Nielsen (2015) , it has not been considered in the UC literature so far. The CSS estimator minimizes the sum of squared prediction errors that is proportional to the exponent in the conditional (quasi-)likelihood function based on the Kalman filter. Due to the computational gains from the fractional filter, the CSS estimator allows to estimate UC models with richer long-run dynamics. The paper provides the asymptotic theory for the CSS estimator, showing it to be consistent and asymptotically normally distributed, while the finite sample properties are assessed by a Monte Carlo study. Using data from the Johns Hopkins University Center for Systems Science and Engineering (Dong et al.; 2020, JHU CSSE) , estimates for contact and reproduction rate are presented for Canada, Germany, Italy, and the United States, where benefits from the new methods directly become apparent: First, estimation results are not only well in line with the chronology of the pandemic, but also allow to identify different contact regimes generated by the strengthening and easing of contact restrictions. Second, a recursive window evaluation shows contact rate estimates at the end of a truncated sample to largely overlap with those based on the full sample information. This makes the fractional filter a suitable candidate for monitoring outbreaks at the current frontier of the data. And third, the proposed estimation and filtering techniques are shown to be fairly robust to under-reporting of recovered cases, which is of particular importance for the US, as several states do not report data on recovered individuals. While under-reporting heavily downward-biases contact and reproduction rate estimates in Lee et al. (2021) , this is shown not to be the case for the fractional filter. The remaining paper is organized as follows: Section 2 motivates the specification of the contact rate and sets up the fractional UC model. Section 3 introduces the fractional filter for log β t , covers parameter estimation via the CSS estimator and presents the asymptotic theory. Section 4 contains empirical results for Canada, Germany, Italy, and the United States, while section 5 concludes. The appendices include proofs for consistency and asymptotic normality of the CSS estimator as well as a Monte Carlo study on the finite sample properties. To motivate the estimation of the contact rate, consider the discrete SIR model, augmented to include deaths, which also forms the starting point of Pindyck (2020, eqn. 1-4) and Lee et al. (2021, eqn. 2.1) In (1), the (initial) population size, normalized to be one, is decomposed into S t , the proportion of the population susceptible in t, I t , the fraction of the population infected in t, D t , the fraction that has died until t, and R t , the proportion that has recovered until t. In (2), γ = γ d + γ r denotes the rate at which infected either die, see (3), or recover, see (4), and obviously γ d , γ r ≥ 0. Thus, γI t−1 denotes the fraction of outflows of infected at t. The fraction of new infections at t is captured by β t S t−1 I t−1 , where S t−1 I t−1 can be interpreted as the average probability of a contact being between a susceptible subject and an infected subject. β t > 0 is called the contact (or transmission) rate. It equals the average number of contacts per person per time unit multiplied by the probability of disease transmission between a susceptible and an infectious person (Hethcote; . As in Lee et al. (2021) , the contact rate is allowed to be time-varying. This reflects social behavior to change over time, e.g. in response to policy changes or to novel information on the pandemic. Since the contact rate determines inflows into infected, see (2), it is the key variable tackled by social distancing policies. Based on the contact rate, the reproduction rate R t = β t /γ can be derived. It is the average number of infections caused by an infected subject during the infectious period 1/γ at the early stage of the pandemic (where S t−1 ≈ 1). R t is an indicator for the current dynamics of the pandemic, as for R t < 1 outflows from infected exceed inflows, causing ∆I t to converge, see (2) where 0 ≤ S t−1 ≤ 1. Thus, if policy seeks to contain the spread of COVID-19, then it must control the contact rate, which controls the reproduction rate R t . As shown by Lee et al. (2021) , from (1) to (4) a measurement for the contact rate β t can be obtained directly: Denote C t = I t + R t + D t as the fraction of confirmed cases (consisting of infected, recovered, and deceased cases) and use ∆C t = ∆I t + ∆R t + ∆D t together with (2) to see Lee et al. (2021, eqn. 2.2) . As argued there, if for each t the data (C t , R t , D t ) can be observed, then the time-varying contact rate can be calculated straightforwardly via (5) using S t = 1 − C t , as well as Unfortunately, reported case numbers for C t , R t , and D t , such as the daily data from JHU CSSE used in the applications in section 4, suffer from measurement errors, see e.g. Hortaçsu et al. (2021) . In addition, they display a strong weekly seasonal pattern that is likely to be driven by a varying number of tests conducted over the different days of the week (Bergman et al.; . Under the assumption that Y t is measured with a proportionally constant error variance resulting from seasonality and measurement errors, one has the following structure for the natural logarithm of the observableỸ t . Assumption 1 (Multiplicative seasonal and measurement errors). For each t, the observableỸ t with Y t as given in (5). s i,t are seasonal dummies for i = 1, ..., 7, that capture the weekly patterns of reported case numbers, 7 i=1 α i = 0, and the measurement error u t ∼ W N (0, σ 2 u ) is white noise. Assumption 1 specifies an unobserved components (UC) model where the observable noisy measurement logỸ t is decomposed into an unobservable measurement error u t , seasonal components 7 i=1 α i s i,t , and the log contact rate log β t . The log specification accounts for a proportional impact of measurement errors and seasonality, and forces the contact rate to be strictly positive. As the different components are not separately identified, an additional assumption on the dynamic structure of the contact rate is required. Empirical models of COVID-19 case numbers have so far assumed log β t to follow a piece-wise linear time trend with structural breaks, see Hartl, Wälde and Weber (2020) ; Lee et al. (2021); Liu et al. (2021) . As an alternative, the UC literature suggests to model time-varying coefficients as random walks (see Durbin and Koopman; 2012, for an overview). Both specifications assume contact rate growth ∆ log β t only to be contemporaneously affected either by structural breaks or by stochastic shocks, an assumption that is likely to be violated. Reflecting that the persistence properties of social behavior, and thus of the contact rate, are unknown, assumption 2 specifies the log contact rate as a fractionally integrated process of unknown order d. Assumption 2 (Specification of the contact rate). The log contact rate follows a type II fractionally integrated process of order d ∈ R + , denoted as log β t ∼ I(d), where µ is an intercept, and the η t are white noise and are independent of the measurement error u t . Under assumption 2, the log contact rate log β t is a stochastic long memory process generated by the shocks {η i } t i=1 . The shock η t models the information new in t, such as news reports or policy announcements. Social decisions, reflected in log β t , however may additionally depend on past information η t−1 , ..., η 1 . Together, {η i } t i=1 forms the information available at t, conditional on which social decisions, e.g. on whether to meet, are made. The specification takes into account that new information does not evolve deterministically, but appears as stochastic shocks, which cannot be captured by a deterministic specification as e.g. in Lee et al. (2021) . The degree of persistence of the log contact rate is determined by the integration order d, which controls for the persistent impact of past shocks via the fractional difference operator ∆ d + . The latter exhibits a polynomial expansion in the lag operator L of order infinite The +-subscript denotes a truncation of an operator at t ≤ 0, ∆ d which reflects the type II definition of fractionally integrated processes (Marinucci and Robinson; 1999) . For d = 1 the log contact rate is a random walk, which follows from plugging d = 1 into (6). Consequently, assumption 2 encompasses the predominant specification in the UC literature. However, assumption 2 allows for a far more general dynamic impact of past shocks η 1 , ..., η t on log β t , as can be seen by plugging x t = ∆ −d + η t into log β t = µ + x t , which gives While a random walk is an unweighted sum of past shocks η 1 , ..., η t , so that π i (−1) = 1 for all i = 1, ..., t − 1, allowing d = 1 yields non-uniform weights of past shocks in the impulse response function of log β t and thus a gradual adjustment of the log contact rate to new information. This reflects that social behavior adjusts step-wise to new information both at the individual and the aggregate level. As processing new information on the Coronavirus and revising individual decisions (e.g. meeting friends, traveling, working from home) takes time and evolves gradually, individuals can be expected to step-wise adjust their contacts in response to new information. Overall, individuals will react heterogeneously both in terms of speed and intensity to novel information: Some will anticipate new information faster than others, and the extent of reaction will depend on individual characteristics such as risk awareness and attitudes. Such gradual adjustments are well captured by assumption 2, in particular when 1 < d < 2: In that case, contact rate growth ∆ log β t ∼ I(d − 1) is strongly persistent and mean-reverting, as will become apparent in the applications in section 4. Strong persistence reflects the gradual adjustment of social behavior to new information, while mean-reversion ensures an asymptotically declining impact of past information to today's contact rate growth. The remaining assumptions are imposed mainly for technical reasons. The type II definition of fractional integration assumes zero starting values for the fractionally integrated process by truncating the polynomial expansion of the fractional difference operator, ∆ d It is required to treat the asymptotically stationary (d < 1/2, from now on 'stationary' for brevity) and the asymptotically nonstationary case (d > 1/2, from now on 'nonstationary') alongside each other. While the type II definition may be a strong assumption for some time series, it is plausible for the contact rate, as we have data covering roughly the whole pandemic. Thus, the pre-sample shocks η i , i ≤ 0, should be zero. Independence of u t and η t follows from the characterization of u t as a measurement error that should not influence the contact rate. In general, the assumption can be relaxed to allow for Corr(η t , u t ) = 0, as for instance in correlated UC models (Morley et al.; 2003) , and will not affect the asymptotic results in section 3. The distributional assumptions on η t and u t are somewhat weaker than the assumption of Gaussian white noise on which UC models typically rely (Morley et al.; 2003) . They will be shown to be largely satisfied in the applications of section 4. Finally, d > 0 is required to separately identify log β t and u t . In this section, the fractional filter is derived. It is a computationally simple modification of the Kalman filter that avoids the Kalman recursions for the conditional variance. The modification is necessary, as the Kalman filter becomes computationally infeasible for UC models when the dimension of the state vector is high, as for fractionally integrated processes. The fractional filter provides a closed-form expression for the prediction error of the UC model. Based on that, a conditional-sum-of-squares (CSS) estimator for the model parameters is set up. It minimizes the sum of squared prediction errors obtained from the fractional filter. The CSS estimator is shown to be consistent and asymptotically normally distributed. Given the CSS parameter estimates, the log contact rate can be estimated by the fractional filter given the full sample information. Finally, estimation of the mean and seasonal components is considered. Under assumptions 1 and 2, the fractional UC model is given by Denote µ 0 , α 1,0 , ..., α 7,0 , d 0 , σ 2 η,0 , σ 2 u,0 as the true parameters of the data-generating mechanism. Leaving aside the deterministic terms for the moment, by defining y t = logỸ t − µ − 7 i=1 α i s i,t , the stochastic part of the fractional UC model (8) is In the following, let θ = (d, σ 2 η , σ 2 u ) ∈ Θ denote the vector holding the parameters of (9), and let Define F t as the σalgebra generated by y 1 , ..., y t , and let the expected value operator E θ (z t ) of an arbitrary random variable z t denote that expectation is taken with respect to the distribution of z t given θ, so that E θ 0 (z t ) = E(z t ). Furthermore, let Σ (i,j) denote the (i, j)-th entry of an arbitrary matrix Σ. Estimation of the parameters θ 0 is carried out by the CSS estimator that minimizes the sum of squared prediction errors of model (9). The prediction error is defined as the one-step ahead It depends on E θ (x t+1 |F t ), for which the fractional filter provides an analytical solution. The filter is introduced in the following lemma. Lemma 3.1 (Fractional filter for x t+1 given F t ). Under assumptions 1 and 2 where y t:1 = (y t , ..., y 1 ) , η t:1 = (η t , ..., η 1 ) , Σ η t:1 y t:1 = Cov θ (η t:1 , y t:1 ), and Σ y t:1 = Var θ (y t:1 ). The superscript in Σ (i,·) η t:1 y t:1 denotes the i-th row of the matrix, and The proof is contained in appendix C. As can be seen from lemma 3.1, the fractional filter provides a solution for E θ (x t+1 |F t ) that only depends on θ and y 1 , ..., y t . By plugging it into (10), one has the closed-form expression for the prediction error Based on (11) the objective function of the CSS estimator for θ 0 is set up Note that estimating the parameters of the fractional UC model via the CSS estimator (12) in combination with the fractional filter deviates from the methodological state space literature: There, expectation and variance of x t+1 conditional on F t are typically obtained from the Kalman filter recursions (see e.g. Durbin and Koopman; 2012, ch. 4 .3). The resulting prediction error and its conditional variance then enter the Gaussian (quasi-)likelihood function that is maximized to estimate θ 0 . However, the Kalman filter becomes computationally infeasible when the dimension of the state vector is high, as for fractionally integrated processes. Thus, a computationally simpler filter is required. The fractional filter, as defined in lemma 3.1, is a modification of the Kalman filter: Its solution for E θ (x t+1 |F t ) is identical to the Kalman filter (see Durbin and Koopman; 2012, ch. 4 .2), but it avoids the Kalman recursions for the conditional variance of x t+1 . While the conditional variance is necessary for (quasi-)maximum likelihood estimation, the CSS estimator only requires a closed-form expression for the prediction error, for which the fractional filter is sufficient. The objective function of the CSS estimator in (12) is of course proportional to the exponent in the conditional Gaussian (quasi-)likelihood function. However, CSS estimation is computationally much simpler due to the fractional filter. Together, the fractional filter and the CSS estimator provide a computationally feasible alternative to the Kalman filter and the (quasi-)maximum likelihood estimator, particularly for UC models with richer long-run dynamics. While the asymptotic theory of the CSS estimator is well established for ARFIMA models, see Hualde and Robinson (2011) and Nielsen (2015), it has not yet been derived for structural UC models. To fill this gap, theorems 3.2 and 3.3 summarize the asymptotic estimation theory for the CSS estimator for fractional UC models. In addition, the finite sample properties are addressed by a Monte Carlo study in appendix B. For consistency and asymptotic normality of the CSS estimator, the moment assumptions on the shocks η t , u t need to be strengthened. Assumption 3 (Higher moments of η t , u t ). The conditional moments of η t , u t (conditional on past η t−1 , η t−2 , ..., and u t−1 , u t−2 , ...) are finite up to order four and equal the unconditional moments. Theorem 3.2 (Consistency). Under assumptions 1 to 3 the CSS estimatorθ is consistent, The proof of theorem 3.2 is given in appendix C and is carried out as follows: First, the model in (9) is shown to be identified. Next, v t (θ), as given in (11), is shown to be integrated of order d 0 − d, and thus is stationary for d 0 − d < 1/2 and nonstationary for d 0 − d > 1/2. As the asymptotic behavior of the objective function changes around the point d 0 − d = 1/2, the objective function does not uniformly converge in probability on Θ. Adopting the results of Hualde and Robinson (2011) and Nielsen (2015) , who show for ARFIMA models encompassing the reduced form of (9) that the probability of the CSS estimator to stay in the region of the parameter space where v t (θ) is nonstationary is asymptotically zero, the relevant region of Θ asymptotically reduces to the region where d 0 − d < 1/2 holds. Within the relevant region of the parameter space this paper then proves weak convergence of the objective function by showing the objective function to satisfy a uniform weak law of large numbers. This yields consistency of the CSS estimator, see Wooldridge (1994, thm. 4 .3). Theorem 3.3 (Asymptotic normality). Under assumptions 1 to 3 the CSS estimatorθ is asymp- The proof of theorem 3.3 is again contained in appendix C. Since the CSS estimator is consistent, the asymptotic distribution theory is inferred from a Taylor expansion of the score function about θ 0 . A central limit theorem is shown to hold for the score function at θ 0 , together with a uniform weak law of large numbers for the Hessian matrix. The latter allows to evaluate the Hessian matrix in the Taylor expansion of the score function at θ 0 . Thus, the asymptotic distribution of the CSS estimator, as given in theorem 3.3, can be inferred from solving the Taylor expansion for √ n(θ − θ 0 ). As usual in the state space literature, no analytical solution to the asymptotic variance of the CSS estimator can be provided. The parameters of the reduced form depend non-trivially on θ, so that the partial derivatives of the reduced form cannot be analytically derived. However, from theorem 3.3 it follows that an estimate for the parameter covariance matrix can be obtained from the negative inverse of the Hessian matrix computed in the numerical optimization. Estimation of the latent component x t in (9) is considered next. In line with the methodological literature on state space models, x t is estimated by plugging the CSS estimatesθ into the projection x t|n (θ) = Cov θ (x t , y n:1 ) Var θ (y n:1 ) −1 y n: where y n:1 = (y n , ..., y 1 ) , Σ η t:1 y n:1 = Cov θ (η t:1 , y n:1 ), and Σ y n:1 = Var θ (y n:1 ). The superscript in η t:1 y n:1 denotes the i-th row of the matrix, and while the entries of Σ y n:1 follow from lemma 3.1 by setting t = n. For θ = θ 0 , x t|n (θ 0 ) is the minimum variance linear unbiased estimator given y 1 , ..., y n , see Durbin and Koopman (2012, lemma 2). Due to theorem 3.2, this property holds asymptotically for x t|n (θ). Note that (13) is identical to the Kalman smoother, see Durbin and Koopman (2012, ch. 4.4) . However, (13) is computationally much simpler, as it avoids the computationally intensive Kalman recursions for the conditional variance. In line with lemma 3.1, (13) is the fractional filter for x t given F n . Finally, estimation of the seasonal components α i,0 , i = 1, ..., 7, and µ 0 is considered. Theoretically, all parameters of the model in (8) could be estimated jointly by the CSS estimator. But as Tschernig et al. (2013) explain, including deterministic terms in the optimization can lead to poor results in finite samples for fractionally integrated processes, particularly when d 0 is close to unity, as the deterministic terms suffer from poor identification. They provide simulation evidence and a line of reasoning explaining why the following two-step estimator is more robust: In the first step, the integration order d 0 is estimated using the exact local Whittle estimator of Shimotsu (2010) , which allows for unknown deterministic terms and yieldsd EW . Based ond EW , the deterministic terms µ 0 , α 0,1 , ..., α 0,7 in ∆d EW are estimated by ordinary least squares. In the second step, the objective function of the CSS estimator in (12) is minimized for the adjusted logỸ t −μ − 7 i=1α i s i,t . As an alternative to (14), one could also eliminate the seasonal components by averaging over could then be estimated by ordinary least squares. q determines whether averages are calculated solely based on past data (q = 6), based on centered data around t (q = 3), or based on future data (q = 0). While the second approach does not require to estimate α 1,0 , ..., α 7,0 , averaging over seven days smooths out potential kinks in the contact rate which is problematic. Furthermore, averaging may pollute the estimates of x t and induce spurious long memory. Finally, the choice of q is not trivial: While for forecasting purposes q = 6 is adequate, choosing q = 0 is likely to account best for the delay in reporting of case numbers, and obviously q = 3 may be a good compromise between the two options. In the applications µ 0 , α 0,1 , ..., α 0,7 will be estimated via (14). In this section, estimation results for the time-varying contact rate β t are presented for Canada, Germany, Italy, and the United States. The underlying data on confirmed, recovered, and de- Instead of smoothing out seasonality, the data is adjusted for weekly seasonal patterns as described at the end of section 3 using (14). The bandwidth for the exact local Whittle estimator in (14) is set to m = n 0.65 , which is justified by the Monte Carlo study in appendix B. Based on the seasonally adjusted data, the parameters θ 0 are estimated via the CSS estimator (12) Plugging the CSS estimates into (13), together withμ in (14), yields the log contact rate estimate logβ t . The average infected period is required for R t and is estimated by solving (2) for γ and taking the averageγ This reflects that the definition of recovered varies over the countries under study, particularly as non-hospitalized persons are typically assumed to have recovered h days after they tested positive, and h varies over the countries under study. The choice of h proportionally affects the number of currently infected I t , and thus β t is inversely proportional to h by (5) For Canada, figure 1 sketches the estimated log contact rate and the resulting reproduction ratê R t =β t /γ in the first row. The average duration of an infection is estimated to be 1/γ = 18.29 days. The second row of figure 1 displays the estimated prediction error v t (θ) and its estimated autocorrelation function. Based on the top-left panel of figure 1 , several turning points of the contact rate can be identified using a simple algorithm that defines a minimum (maximum) whenever the contact rate β t at t is smaller (greater) than all β t+1 , ..., β t+10 , the contact rates of the next ten days. The bottom-left panel shows the estimated prediction error v t (θ) in (11) together with two standard deviations in blue, dashed. The bottom-right panel sketches the estimated autocorrelation function of the prediction error v t (θ) together with a 95% confidence interval. 6. October 1 -October 17: Contact and reproduction rate slightly decrease. Thanksgiving takes place on October 12. 7. October 18 -November 7: After Thanksgiving, contact and reproduction rate increase slightly. On November 3rd, Ontario introduces an incidence-based system for when to tighten containment measures. 8. November 8 -December 23: Contact and reproduction rate exhibit a slight but steady decrease. Reproduction rate remains above unity. The estimated contact rate and the resulting reproduction rate are well in line with the chronology of policy interventions. In particular, the fractional filter allows to identify turning points of the contact rate that are not visible from the raw data that is plotted in gray color in figure 1. The two graphs at the bottom of figure 1 illustrate how well the Canadian data fits the model assumptions. Assumptions 1 and 2 assume the measurement error u t and the log contact rate shock η t to be homoscedastic white noise processes. Sinceθ is consistent, see theorem 3.2, by The estimated integration order isd = 1.2166, which implies that a unit shock on the contact rate growth ∆ log β t retains 21.66% of its impact in t + 1, 13.17% in t + 2, and 9.73% in t + 3. After one week, the impact is still 5.10%, after two weeks 2.98%, and after three weeks 2.17%, which is due to the strong persistence of fractionally integrated processes, see assumption 2 for the formula for π i (d − 1). The slow decay may very well describe the persistent impact of past information shocks on today's social behavior. The US is treated separately, since data on recovered cases reported by the JHU CSSE seem heavily downward-biased. To see this, consider the difference between lagged cumulative confirmed, cumulative recovered, and cumulative deceased cases for different lags h For h = 0, (17) measures the number of currently infected subjects. For small h, (17) should be positive, as it takes some time for the infected subjects to either recover or die. As h increases, (17) should turn negative, as an increasing number of subjects contained in the cumulative cases C t−h and subjects infected between t − h and t (and thus contained in C t − C t−h ) either recover or die. The turning point, denoted byh, should be close to the average infected period 1/γ, as long as new confirmed cases between t − h and t, i.e. C t − C t−h , do not explode. If they do, then h should be smaller than 1/γ, as outflows from C t − C t−h disproportionally increase R t and D t . Figure 4 plots (17) Since no reliable data on recovered cases is available for the US, an approximation is required. In the following, it will be assumed that individuals either recover or dieh = 21 days after they tested positive, C t−21 = D t + R t . The assumption is justified as follows: First, it is similar to the average infected period estimated for Germany and more conservative than the estimate for Canada. And second, it is centered in the range of definitions for recovered individuals by the federal states. In addition, estimates forh = 18 andh = 24 days are presented, which gives a reasonable interval for the contact rate. Under the assumption ofh = 21, the estimated integration order equalsd = 1.2499 (see table A This subsection investigates the end-of-sample properties of the fractional filter for real-time estimation of the contact rate. Reliable contact rate estimates at the current frontier of the data would allow to real-time monitor the state of the pandemic and can serve as a surveillance measure for future outbreaks. Based on reliable real-time estimates for the contact rate, policy rules can be implemented to prevent an exponential growth of case numbers. Acting early reduces economic and social costs of containment measures, and consequently a well-designed policy rule will be beneficial, given that the fractional filter yields a reliable estimate for the current level of the contact rate. Drawing inference on the latter is the focus of this subsection. In detail, real-time monitoring is simulated by truncating the sample at a certain point t, r ≤ t ≤ n, where r is the minimum sample size for the CSS estimator to produce reasonable estimates. The parameters θ 0 , µ 0 , α 1,0 , ..., α 7,0 of (8) are then estimated as described in section 3 using the information available at time t, F t , and the resulting parameter estimates are denoted asθ (t) ,μ (t) , etc. To take into account reporting lags, and to be robust against outliers at the end of the sample, a little backward-smoothing is allowed by reporting the smoothed estimate for the log contact rate at period t − 3 given the information available at period t. From (13), the smoothed estimates are As (18) only depends on information available at t, it mimics the situation of a policy maker at t and can be used to draw inference on the monitoring properties of the fractional filter at time t. Based onβ t−3|t , policy rules to prevent an exponential spread of the virus can be designed. Such rules could, for instance, define a threshold forR t−3 at which additional containment measures are implemented. As the threshold should naturally depend on the number of currently infected, current hospital capacities, and other parameters, the precise design of such a policy rule is left to the experts, and only a primitive policy rule will be introduced later for illustrative purposes. The reliability of real-time estimates for the contact rate is assessed by the following experiment: First, an estimation sample that consists of information available until May 31 is defined, for which θ 0 , µ 0 , α 1,0 , ..., α 7,0 are estimated. It consists of at least 80 observations, which is considered as a reasonable sample size for the estimation sample. Based on these estimates, logβ r−3|r is obtained as described above. In a second step, information available on June 1 is added to the sample and parameter estimates are updated using theθ (r) from the estimation sample as starting values for the CSS estimator, which givesθ (r+1) . As before, the estimate for logβ r−2|r+1 is stored. The procedure repeats for all t, r < t ≤ n, where in every step t the CSS estimator is initialized byθ (t−1) . The resulting real-time estimates for the contact rate are then compared to those of subsections 4.1 and 4.2 to draw inference on their reliability. In addition, a primitive policy rule is introduced. It assumes governments to take action as soon asR t−3 =β t−3|t /γ > 1.2. The latter is motivated by the observation that preventing an exponential propagation (i.e. R t−3 > 1) is desirable, and a margin of 0.2 is included to be robust against outliers. Finally, the real-time contact rate estimates are compared to a rolling seven-day average which includes three forward-looking observations and should smooth out the seasonality. The four panels on the right side of figure 6 display the deviations of logβ t−3|t and logβ benchmark t−3|t from the log contact rate estimates based on the full sample information logβ t−3|n . Thus, they shed light on whether the fractional filter improves estimates for the contact rate compared to a rolling seven-day average that uses three forward-looking observations. For Italy and the US, the advantages of the fractional filter directly become apparent, as the benchmark exhibits greater deviations. For Canada and Germany, the fractional filter performs comparably well when large outliers occur, e.g. around July 20 for Canada and around June 20 for Germany. To extract a time-varying signal for the COVID-19 contact rate from daily data on confirmed, recovered, and deceased cases, this paper introduces a novel unobserved components model. It models the log contact rate as a fractionally integrated process of unknown integration order. A computationally simple modification of the Kalman filter is introduced and is termed the fractional filter. It provides a closed-form expression for the prediction error that allows to estimate the model parameters by a conditional-sum-of-squares (CSS) estimator. The asymptotic theory for the CSS estimator is provided. For the countries under study, estimation results are well in line with the chronology of the pandemic. They allow to draw inference on the impact of policy measures such as contact restrictions. The new filtering method bears great potential as a monitoring device for the current state of the pandemic, as it yields reliable contact rate estimates at the current frontier of the data. As vaccines become more and more available, future research can generalize the model to include the number of vaccinated. For instance, this can be done by decomposing 1 = S t + I t + R t +D t +V t , where V t is the fraction of vaccinated. The states R t and V t should be non-overlapping as long as vaccines are not rolled out to recovered subjects. While vaccine recommendations vary over the different countries, some assign a lower priority to recovered subjects, so that R t and V t are non-overlapping at the early stage of the vaccine roll-out. Furthermore, mutations of the Coronavirus can be taken into account e.g. by allowing for a smooth transition between a contact rate with a low probability of virus transmission and one with a high probability. For applications beyond COVID-19 related data, the fractional filter offers a robust, flexible, and data-driven way for signal extraction of data of unknown persistence. It requires no prior assumptions on the integration order of a process, and thus provides a solution to model specification in the unobserved components literature. Due to its computational advantages compared to the classic Kalman filter, it allows to estimate unobserved components models with richer dynamics. Italy The finite sample performance of the CSS estimator is assessed in a Monte Carlo study, where, to be in line with (9), the data-generating mechanism is given by , u t , η t are uncorrelated, and σ 2 η,0 = ρσ 2 u,0 so that ρ controls the signal-to-noise ratio. The integration orders d 0 ∈ {0.75, 1.25, 1.75} cover the relevant interval for the applications in section 4, while ρ ∈ {0.5, 1, 2} captures high and low signal-to-noise ratios. The variance parameter is set to σ 2 u,0 = 1. Different sample sizes n ∈ {100, 200, 300} covering the relevant regions for the applications in section 4 are considered. The parameters θ 0 = (d 0 , σ 2 η,0 , σ 2 u,0 ) are estimated via the CSS estimator as described in section 3. For each specification, 1000 replications are simulated, and starting values are set to θ start = (1, 1, 1). In addition to the CSS estimates, estimation results for d 0 from the exact local Whittle estimator of Shimotsu (2010) are reported as benchmarks for m = n j Fourier frequencies, j ∈ {0.45, 0.50, 0.55, 0.60, 0.65, 0.70}. Finally, the mean squared error M SE x and the coefficient of determination R 2 x for the estimation of x t , that are calculated via are reported, and indicate how well x t is estimated by the fractional filter (13). The results for the Monte Carlo study are contained in smaller for higher d 0 , which is plausible as the fraction of total variation of y t generated by x t increases with d 0 . For the same reason, it decreases with increasing ρ. The same conclusions on the precision with which d 0 is estimated hold for the mean squared error of x t , which decreases as n, d, and ρ increase. The proportion of explained variation of x t , measured by R 2 x , is high and thus x t is estimated well via (13). Particularly for d = 1.25, which is the relevant case for the applications in section 4, the R 2 x is close to unity for all n. Proof of Lemma 3.1. First, note that E θ (x t+1 |F t ) = E θ (y t+1 |F t ), so that it is sufficient to derive the latter expression. For this, consider the reduced form of (9), which follows from taking fractional differences and utilizing the aggregation properties of MA processes, see Granger and Morris (1976) , so that , and φ(L, θ) is invertible. σ 2 ε and the coefficients in φ(L, θ) can be derived by matching the autocovariance functions of (C.1), see Watson (1986, eqn. 2.6) , and depend non-linearly on θ. However, they are not required for the proof. Solving for ε t yields From the type II definition of fractional integration, see assumption 2, it follows that Cov θ (y t , y j ) = 0 for all j ≤ 0, t > 0, and thus E θ (y t+1 |F t ) = t i=1 A i (θ)y t+1−i . The (yet unknown) coefficients A i (θ) follow from the Yule-Walker equations Var θ (y t ) Cov θ (y t−1 , y t ) · · · Cov θ (y 1 , y t ) Cov θ (y t , y t−1 ) Var θ (y t−1 ) · · · Cov θ (y 1 , y t−1 ) . . . . . . . . . . . . Cov θ (y t , y 1 ) Cov θ (y t−1 , y 1 ) · · · Var θ (y 1 ) so that by defining the vectors A(θ) = (A 1 (θ), ..., A t (θ)), y t:1 = (y t , ..., y 1 ) , and solving the Yule-Walker equations for A(θ), one has A(θ) = Cov θ (y t+1 , y t:1 ) Var θ (y t:1 ) −1 , which implies A i (θ)y t−i = Cov θ (y t+1 , y t:1 ) Var θ (y t:1 ) −1 y t:1 . From Cov θ (y t+1 , y t:1 ) = Cov θ (x t+1 , y t:1 ) = Proof of Theorem 3.2. First, the model in (9) is shown to be identified. Identification follows if the parameters σ 2 η , σ 2 u can be recovered from the autocovariance function of the reduced form φ(L, θ)ε t in (C.1). To see this, consider the covariances Var θ (φ(L, θ)ε t ) = σ 2 so that solving for (σ 2 η , σ 2 u ) yields and thus (σ 2 η , σ 2 u ) can be uniquely recovered from the reduced form. The assumption that d > 0 is crucial, as it guarantees t−2 i=0 π i (d)π i+1 (d) = 0, so that the matrix in (C.2) has full rank. Next, the CSS estimator based on the reduced form (C.1) is derived and is shown to be identical to (12). Multiplying (C.1) by φ(L, θ) −1 yields ε t = φ(L, θ) −1 ∆ d + y t , based on which a reduced form CSS estimator can be constructed. and equals the CSS estimator in (12). To see this, add and subtract y t from ε t (θ) = ψ + (L, θ)∆ d + y t , so that y t = (1 − ψ + (L, θ)∆ d + )y t + ε t (θ), and plug y t into the conditional expectation in (10) v The third equality follows from (1 − ψ + (L, θ)∆ d + )y t being F t−1 -measurable, since ψ(L, θ) = 1− ∞ i=1 ψ i (θ)L i and π 0 (d) = 1. Thus, the contemporaneous y t cancel in the expectation operator and the whole term can be taken out of the expectation operator. From v t (θ) = ε t (θ) it follows that the optimization problems in (12) and (C.3) are identical. Next, the integration order of the residuals is assessed. Since y t ∼ I(d 0 ), the residuals satisfy For d 0 −d < 1/2 the residuals are stationary, while for d 0 −d > 1/2 they are nonstationary. As the asymptotic behavior of the objective function changes around d 0 −d = 1/2, the objective function does not uniformly converge on the set of admissible values for d. The same problem is addressed by Hualde and Robinson (2011) and by Nielsen (2015) for ARFIMA models encompassing (C.1). Nielsen (2015, eqn. 8) shows that a weak law of large numbers (WLLN) applies to the sum of squared residuals whenever d 0 −d < 1/2, while the sum of squared residuals diverges in probability implying that Pr(d ∈ D * (κ) ∩ θ ∈ Θ) → 1 as n → ∞. From (C.6) it follows that the relevant parameter space asymptotically reduces to the stationary region Θ * (κ) = {θ|θ ∈ Θ, d ∈ D * (κ)}. for any fixed 0 < κ < 1/2. To prove (C.7), it will be helpful to note that for a white noise process ε t , MA weights ∞ i=0 |m h,i (θ)| < ∞, h = 1, 2, and the setΘ = {θ|θ ∈ Θ, d 0 − d < 1/2}, it holds that for j, k ≥ 0 as shown by Nielsen (2015, lemma B.3 ). Now, consider the partial derivatives of (C.3) Since ψ + (L, θ) satisfies the absolute summability condition for (C.8), it follows that while the partial derivatives of ∆ d−d 0 + w.r.t. σ 2 η , σ 2 u are zero. For the remaining term in (C.9), note that the sum of absolute coefficients of the truncated polynomial ψ + (L, θ) is bounded by the sum of absolute coefficients of the untruncated polynomial ψ(L, θ) = φ(L, θ) −1 . Thus, it is sufficient to prove absolute summability of the coefficients in ∂ψ(L, θ)/∂θ = −ψ(L, θ) 2 (∂φ(L, θ)/∂θ). Absolute summability of the coefficients in ∂φ(L, θ)/∂θ is shown in lemma D.1 in appendix D. Since ψ(L, θ) is stable, ∂ψ(L, θ)/∂θ satisfies the absolute summability condition for (C.8) and thus From (C.10) and (C.11) it follows that (C.7) holds. Consequently, the supremum of the gradient satisfies a WLLN for θ ∈ Θ * (κ), which generalizes the pointwise convergence of the objective function to weak convergence, implying that a UWLLN holds for the objective function. Since the model is identified, consistency of the CSS estimator follows from the UWLLN together with (C.6), and thusθ p −→ θ 0 as n → ∞, see Wooldridge (1994, thm. 4 .3). Proof of Theorem 3.3. Since the CSS estimator is consistent, see theorem 3.2, the asymptotic distribution theory can be inferred from a Taylor expansion of the score function about θ 0 where the entries inθ satisfy |θ i − θ 0,i | ≤ |θ i − θ 0,i | for all i = 1, 2, 3, and θ i denotes the i-th entry of θ = (d, σ 2 η , σ 2 u ) , i = 1, 2, 3. The score function at θ 0 follows from (C.9) (C.14) ε t (θ) = ψ(L, θ)φ(L, θ 0 )∆ d−d 0 ε t is the untruncated residual generated by the untruncated ∆ d and ψ(L, θ) = 1− ∞ i=1 ψ i (θ)L i , and the second equality in (C.13) is shown to hold by Robinson (2006, pp. 135-136) . In the following, let S (j) n denote the j-th entry of S n holding the partial derivative w.r.t. θ j , j = 1, 2, 3, and let C 1,j (L, θ) = ∞ i=1 C 1,j,i (θ)L i = φ(L, θ 0 )(∂/∂θ j )[ψ(L, θ)∆ d−d 0 ] denote the coefficients of the partial derivative ofε t (θ) w.r.t. θ j . To derive the asymptotic distribution theory for the CSS estimator, a central limit theorem (CLT) is shown to hold for the score function at θ 0 . Next, it is proven that a UWLLN holds for the Hessian matrix by showing that the Hessian matrix and its first partial derivatives satisfy a WLLN (Wooldridge; 1994, thm. 4.2) . The UWLLN allows to evaluate the Hessian matrix in (C.12) at θ 0 and yields the asymptotic distribution of √ n(θ−θ 0 ). As the reduced form coefficients φ(L) depend non-trivially on θ, no analytical expression for the asymptotic variance of the CSS estimator is provided. Instead, it will be shown that the CSS estimator is asymptotically normally distributed, and its asymptotic variance is shown to exist. This allows to estimate Var(θ) e.g. via the inverse of the numerical Hessian matrix. Starting with the score function, similar to Nielsen (2015, p. 175) a CLT can be inferred from the Cramér-Wold device by showing that for any 3-dimensional vector µ = (µ 1 , µ 2 , µ 3 ) , it holds To see this, define the σ-algebraF t = σ({ε s , s ≤ t}) generated by the white noise ε t and its lags. Next, note that in (C.14) the term ε t [∂ε t (θ)/∂θ θ=θ 0 ] adapted toF t is a stationary MDS, since ε t is white noise, the partial derivatives areF t−1measurable, and the coefficients of the partial derivatives are absolutely summable, as shown in the proof of theorem 3.2. It follows for µ S n = 2n −1/2 n t=1 ν t with that ν t adapted toF t is a stationary MDS. Similar to Nielsen (2015, p. 175) , by the law of large numbers for stationary and ergodic processes, the sum of conditional variances for µ S n with S n as given in (C.14) is then , the partial derivatives of the first polynomial ∂ψ(L, θ)/∂θ j = −2ψ(L, θ)(∂φ(L, θ)/∂θ j ) are absolutely summable for all j = 1, 2, 3, as ψ(L, θ) and ∂φ(L, θ)/∂θ j are absolutely summable, see lemma D.1 in appendix D. Furthermore, . Consequently, by the CLT for stationary MDS (see e.g. Davidson; 2000, thm. 6 To evaluate the Hessian matrix in (C.12) at θ 0 , it remains to be shown that a UWLLN applies to the Hessian matrix (Wooldridge; 1994, thm. 4.4) , for which it is sufficient to show that a WLLN holds for the Hessian matrix and for the supremum of its first partial derivatives sup θ∈Θ * (κ) ∂ 3 R t (θ) ∂θ j ∂θ k ∂θ l = O p (1), j, k, l = 1, 2, 3, (C.16) for any fixed κ ∈ (0, 1/2), see Newey (1991, cor. 2.2) and Wooldridge (1994, thm. 4.2) . The Hessian matrix can be derived from (C.9) and is given by and a WLLN holds for the Hessian matrix if the absolute summability condition for (C.8) is satisfied by the two different terms of the Hessian matrix. Since the coefficients of the first partial derivatives of ε t (θ) were shown to be absolutely summable in the proof of theorem 3.2 for θ ∈ Θ * (κ), the first term in (C.17) directly satisfies the condition for (C.8) and thus is bounded in probability. It remains to be shown that absolute summability holds for the coefficients of ∂ 2 ε t (θ)/(∂θ∂θ ). From (C.9) (C.18) for j, k = 1, 2, 3. The coefficients in ∂ψ + (L, θ)/∂θ j were already shown to be absolutely summable in the proof of theorem 3.2, and thus the first and second term in (C.18) satisfy the absolute summability condition for (C.8). As the coefficients in ψ(L, θ) are absolutely summable, the third term in (C.18) is also bounded by (C.8), so that only the coefficients of the second partial derivatives of ψ + (L, θ) need to be shown to be absolutely summable. As their sum is bounded by the sum of absolute coefficients of the untruncated polynomial ψ(L, θ) = φ(L, θ) −1 , it is sufficient to prove absolute summability for the latter. For this, consider ∂ 2 ψ(L, θ) ∂θ j ∂θ k = 2ψ(L, θ) 3 ∂φ(L, θ) ∂θ j ∂φ(L, θ) ∂θ k − ψ(L, θ) 2 ∂ 2 φ(L, θ) ∂θ j ∂θ k , j, k = 1, 2, 3, (C.19) where the coefficients of first and second partial derivatives of φ(L, θ) are shown to be absolutely summable in lemma D.1 in appendix D. Thus, (C.18) satisfies the absolute summability condition for (C.8), so that the Hessian matrix (C.17) satisfies a WLLN. To prove (C.16), consider ∂ 3 R(θ) ∂θ j ∂θ k ∂θ l = 2 n n t=1 ∂ 2 ε t (θ) ∂θ j ∂θ k ∂ε t (θ) ∂θ l + ∂ 2 ε t (θ) ∂θ j ∂θ l ∂ε t (θ) ∂θ k + ∂ 2 ε t (θ) ∂θ k ∂θ l ∂ε t (θ) ∂θ j + ε t (θ) ∂ 3 ε t (θ) ∂θ j ∂θ k ∂θ l , j, k, l = 1, 2, 3, where absolute summability of the coefficients of the first three terms was already shown. Consequently, for the last term to also satisfy the condition for (C.8), the coefficients of the third partial derivatives of ε t (θ) need to be shown to be absolutely summable. The derivatives are ∂ 3 ε t (θ) ∂θ j ∂θ k ∂θ l = ∂ 3 ψ + (L, θ) ∂θ j ∂θ k ∂θ l ∆ d + y t + ψ + (L, θ) ∂ 3 ∆ d−d 0 + ∂θ j ∂θ k ∂θ l ∆ d 0 + y t + r t (θ), (C.20) and r t (θ) holds the products of first and second partial derivatives of ψ(L, θ) and ∆ d−d 0 + that have already been shown to satisfy the absolute summability condition for (C.8). The second term in (C.20) directly satisfies the condition for (C.8), so that only the first term remains to be checked. As before, the partial derivatives of the untruncated polynomial are considered, as they are an upper bound for the sum of absolute coefficients of the truncated polynomial. From (C.19) ∂ 3 ψ(L, θ) ∂θ j ∂θ k ∂θ l =2ψ(L, θ) 3 ∂ 2 φ(L, θ) ∂θ j ∂θ k ∂φ(L, θ) ∂θ l + ∂ 2 φ(L, θ) ∂θ j ∂θ l ∂φ(L, θ) ∂θ k + ∂ 2 φ(L, θ) ∂θ k ∂θ l ∂φ(L, θ) ∂θ j − 6ψ(L, θ) 4 ∂φ(L, θ) ∂θ j ∂φ(L, θ) ∂θ k ∂φ(L, θ) ∂θ l − ψ(L, θ) 2 ∂ 3 φ(L, θ) ∂θ j ∂θ k ∂θ l , j, k, l = 1, 2, 3. Absolute summability of the coefficients of the partial derivatives of φ(L, θ) up to order three is shown in lemma D.1 in appendix D. Consequently, (C.20) satisfies the absolute summability condition for (C.8), so that (C.16) holds. Thus, a UWLLN holds for the Hessian matrix, so that pointwise convergence generalizes to weak convergence. This, together with consistency ofθ (see theorem 3.2) allows to evaluate the Hessian matrix in (C.12) at θ 0 . Analogously to (C.13), it follows from the argument of Robinson (2006, pp. 135-136) that the partial derivatives of ε t (θ) in (C.17) can be replaced by those ofε t (θ) as n → ∞, and ε t (θ 0 ) can be replaced by ε t , which yields ∂ 2 R t (θ) ∂θ j ∂θ k θ=θ 0 = 2 n n t=1 ∂ε t (θ) ∂θ j θ=θ 0 ∂ε t (θ) ∂θ k θ=θ 0 + ε t (θ 0 ) ∂ 2 ε t (θ) ∂θ j ∂θ k θ=θ 0 p −→ 2Ω (j,k) 0 , (C.21) as n → ∞. The second term converges to zero in probability, as the second partial derivatives areF t−1 -measurable, and thus the second term adapted toF t−1 is a stationary MDS. Solving (C.12) for √ n(θ −θ 0 ) and plugging in the limits for first and second partial derivatives yields √ n(θ − θ 0 ) = H t (θ) −1 1 √ n ∂R(θ) ∂θ D Partial derivatives of φ(L, θ) Lemma D.1 (Absolute summability of partial derivatives). For φ(L, θ) in with ε * t ∼ WN(0, 1), u * t ∼ WN(0, 1), η * t ∼ WN(0, 1), φ 0 (θ) = 1, it holds that for all j, k, l = 1, 2, 3, and all θ ∈ Θ, where θ j denotes the j-th entry of θ = (d, σ 2 η , σ 2 u ) . Proof of lemma D.1. The following results are required to prove (D.2) to (D.4). For σ 2 ε , note that by solving the variance of (D.1) for σ 2 . for all j, k, l = 1, 2, 3. For the same reason, it follows from (D.1) that c 4 (θ, θ j , θ k ) = 2σ 2 and rearranging gives ∂ 3 φ(L, θ)σ ε ε * t ∂θ j ∂θ k ∂θ l − c 5 (θ, θ j , θ k , θ l ) 2σ ε φ(L, θ)ε * t − z 2 (θ, θ j , θ k , θ l ) = − c 6 (θ, θ j , θ k , θ l ) 2σ ε ε * where the LHS is a MA process with absolutely summable coefficients for any t by (D.11). As for the first and second partial derivatives, c 6 (θ, θ j , θ k , θ l ) = O(1) holds, as the contemporaneous ε * t do not cancel on the RHS. Due to boundedness of c 6 (θ, θ j , θ k , θ l ), the term c 6 (θ, θ j , θ k , θ l ) = O(1)φ(L, θ)ε * t is a MA process with absolutely summable weights. Since all other terms are MA processes with absolutely summable weights, t−1 i=1 ∂ 3 φ i (θ)/(∂θ j ∂θ k ∂θ l )ε * t−i must also be a MA process with absolutely summable coefficients for the above equality to hold. This proves (D.4). Optimal targeted lockdowns in a multi-group SIR model, NBER Working Paper 27102 An economist's guide to epidemiology models of infectious disease Coronavirus: Germany restricts social life in 'lockdown light Coronavirus: Germany to go into lockdown over Christmas Coronavirus: Italy's Conte offers hope as travel restrictions end Coronavirus: What went wrong at Germany's Gütersloh meat factory Oscillations in U.S. COVID-19 incidence and mortality data reflect diagnostic and reporting factors Extracting a common stochastic trend: Theory with some applications Causal impact of masks, policies, behavior on early covid-19 pandemic in the The cyclical component of U.S. economic activity Econometric Theory Coronavirus digest: Europe toughens restrictions as cases rise Coronavirus: Germany toughens restrictions as it enters 'decisive' phase Italy toughens coronavirus measures amid second wave surge Is consumption too smooth? Long memory and the Deaton paradox An interactive web-based dashboard to track COVID-19 in real time Time Series Analysis by State Space Methods: Second Edition Après-ski: The spread of Coronavirus from Ischgl through Germany Time series modelling and interpretation Fractional trends and cycles in macroeconomic time series Measuring the impact of the German public shutdown on the spread of COVID-19 Trends and cycles in macroeconomic time series The mathematics of infectious diseases Postwar U.S. business cycles: An empirical investigation Estimating the fraction of unreported infections in epidemics with a known epicenter: An application to COVID-19 Gaussian pseudo-maximum likelihood estimation of fractional time series models CAN-NPI: A curated open dataset of Canadian non-pharmaceutical interventions in response to the global COVID-19 pandemic, Working paper Why are the Beveridge-Nelson and unobservedcomponents decompositions of GDP so different? Uniform convergence in probability and stochastic equicontinuity Asymptotics for the conditional-sum-of-squares estimator in multivariate fractional time-series models The relationship between the Beveridge-Nelson decomposition and other permanent-transitory decompositions that are popular in economics COVID-19 and the welfare effects of reducing contagion, NBER Working Paper 27121 Conditional-sum-of-squares estimation of models for stationary time series with long memory Exact local Whittle estimation of fractional integration with unknown mean and time trend Covid-19 in Italy: impact of containment measures and prevalence estimates of infection in the general population The "good" metric is pretty bad: Why it's hard to count the people who have recovered from COVID-19 Fractionally integrated VAR models with a fractional lag operator and deterministic trends: Finite sample identification and two-step estimation, Working Paper 471 Univariate detrending methods with stochastic trends Estimation and inference for dependent processes The author thanks Nicolas Apfel, Uwe Hassler, Timon Hellwagner, Roland Jucknewitz, Alina Prechtl, Veronika Püschel, Lars Schlereth, Rolf Tschernig, Enzo Weber, and the participants of the Department Seminar at the University of Regensburg for very helpful comments. (D.11) and the limits stem from the first, second and third partial derivatives of σ η η * t +σ u t−1 i=0 π i (d)u * t−i w.r.t. d, while all coefficients of the other partial derivatives are bounded below. Consequently, (D.9) to (D.11) are MA processes with absolutely summable coefficients. Note that this is not sufficient for absolute summability of the partial derivatives of φ(L, θ), as σ ε in the numerators of (D.9) to (D.11) also depends on θ., (D.12) and the first term is O(1) due to (D.6). For the partial derivative of φ(L, θ)σ ε ε * t one then hasFrom (D.9) it follows that the term on the left hand side (LHS) is a MA process with absolutely summable coefficients for any t. Since the same holds for φ(L, θ)ε * t , by (D.12) the first term on the right hand side (RHS) is also a MA process with absolutely summable coefficients. Consequently, the difference of the latter two terms on the RHSis also a MA process with absolutely summable coefficients. As the contemporaneous impact of ε * t cannot cancel, it follows that c 2 (θ, θ j ) = O(1) is bounded, and thus the second term on the RHS of (D.13) is a MA process with absolutely summable coefficients. For the equality in (D.13) to hold, it must thus hold that σ εis also a MA process with absolutely summable coefficients for any t, which proves (D.2).and c 3 (θ, θ j , θ k ) = O(1) is bounded due to (D.2) and (D.7). The second partial derivatives ofand z 1 (θ, θ j , θ k ) is a MA process with absolutely summable coefficients due to (D.2). Plugging in ∂ 2 σ 2 ε /(∂θ j ∂θ k ) = c 3 (θ, θ j , θ k ) − c 4 (θ, θ j , θ k ) and rearranging terms yieldswhere the LHS is a MA process with absolutely summable coefficients for any t due to (D.10) and(D.14). Again, as the contemporaneous ε * t cannot cancel out, c 4 (θ, θ j , θ k ) = O(1) is bounded. Therefore, c 4 (θ, θ j , θ k )/(2σ ε )φ(L, θ)ε * t is a MA process with absolutely summable weights, so that for the equality above to hold, t−1 i=1 ∂ 2 φ i (θ)/(∂θ j ∂θ k )ε * t−i must also be a MA process with absolutely summable weights for any t, which proves (D.3).Turning to (D.4), the third partial derivatives of the variance parameter σ 2 ε can be represented as ∂ 3 σ 2 ε /(∂θ j ∂θ k ∂θ l ) = c 5 (θ, θ j , θ k , θ l ) − c 6 (θ, θ j , θ k , θ l ) withc 5 (θ, θ j , θ k , θ l ) holds the products of first and second partial derivatives of σ 2 ε and φ(1, θ) that have already been shown to be O(1), as well as ∂ 3 /(∂θ j ∂θ k ∂θ l ) σ 2 η + σ 2 u t−1 i=0 π 2 i (d) that is O(1) as shown in (D.8). Consequently c 5 (θ, θ j , θ k , θ l ) = O(1), and the exact expression is omitted for brevity. The third partial derivatives of φ(L, θ)σ ε ε * t follow from (D.16) and equalwhere z 2 (θ, θ j , θ k , θ l ) holds the products of the first and second partial derivatives of σ 2 ε and φ(L, θ) for which absolute summability was shown above. Therefore, z 2 (θ, θ j , θ k , θ l ) is a MA process with absolutely summable coefficients. Plugging in ∂ 3 σ 2 ε /(∂θ j ∂θ k ∂θ l ) = c 5 (θ, θ j , θ k , θ l )−c 6 (θ, θ j , θ k , θ l )