key: cord-1048661-0um63bhg authors: Li, Shaoran; Linton, Oliver title: When will the Covid-19 pandemic peak?() date: 2020-09-08 journal: J Econom DOI: 10.1016/j.jeconom.2020.07.049 sha: 191187965f76e51d8560900dc052d49fda5f07b5 doc_id: 1048661 cord_uid: 0um63bhg We carry out some analysis of the daily data on the number of new cases and the number of new deaths by (191) countries as reported to the European Centre for Disease Prevention and Control (ECDC). Our benchmark model is a quadratic time trend model applied to the log of new cases for each country. We use our model to predict when the peak of the epidemic will arise in terms of new cases or new deaths in each country and the peak level. We also predict how long the number of new daily cases in each country will fall by an order of magnitude. Finally, we also forecast the total number of cases and deaths for each country. We consider two models that link the joint evolution of new cases and new deaths. We implement econometric models of daily data on the number of new cases of and new deaths from COVID-19 in countries worldwide. Our benchmark model is a regression of log outcomes on a quadratic trend function for each country. Since April 5th 2020 we have been estimating our model every day and providing outlooks for the future development of the pandemic in different countries. Our primary purpose when we initiated this study was to estimate the turnaround dates, the expected peak to trough times, and the expected total number of cases and deaths using only the data at hand, which was before the peak had been achieved. The results have been updated daily at the website http://covid.econ.cam.ac.uk/linton-uk-covid-cases-predicted-peak; R-code is available upon request. On this website we have also provided various robustness tests: we considered quantile estimation in place of mean estimation to limit the effect of large measurement error; we provided a one-step ahead forecasting exercise that keeps track of the model performance. We evaluated the residuals from our model along several dimensions. In particular, we found little evidence of day of the week effects in most countries, although the United States and the United Kingdom do appear to have a seasonal effect in deaths. We find some evidence of autocorrelation in the residuals but it varies widely across countries, with the mean effect (across countries) being positive autocorrelation. We find some limited evidence of time varying cross-sectional heteroskedasticity. We find the error distribution pooled across countries is not far from symmetric for cases and slightly less so for deaths. There appears to be substantial cross sectional contemporaneous correlation between the errors from cases and deaths within a country (positive) and between cases in one country and another (here, the ✩ Thanks to Vasco Carvalho and Giancarlo Corsetti for comments. This is of course work in progress and subject to errors given the timescale in which the work has been done. mean is small and positive but there is wide dispersion meaning there are also negative correlations in some cases). The SUR models we fit cannot benefit from GLS, but the error properties can affect the standard errors and test statistics. We also develop a joint model for the evolution of new cases and new deaths and test the restrictions it implies and estimate the restricted model. Our model is purely statistical and we do not pretend to model the disease dynamics per se, just the data. However, the epidemiological models themselves do not have perfect forecasting records and our approach is complementary to the large literature produced by professional epidemiologists and biostatisticians. 3 One advantage of our model over dynamic epidemiological models is that the publicly available data from many countries is subject to a wide range of errors that make dynamic models very suspect without some additional model for the measurement error, which typically requires many untestable assumptions. Literature Review. Many researchers apply or extend the SIR or its variants (SIRD, SEIR) to model the dynamics of the Covid-19 outbreak. For example, Wu et al. (2020) , Anastassopoulou et al. (2020) , and Lin et al. (2020) . A short description of the baseline SIR model is as follows. The fixed population (N) can be split into three non-intersecting classes: susceptibles (S), infectious (I) and recovered (R). The number of individuals in each of these classes changes over time and satisfies The dynamics of each class can be described using ordinary differential equations (ODEs) as follows: where β is the transmission rate constant, βI is the force of infection, and βIS is the number of individuals who become infected per unit of time. The susceptible individuals who become infected move to the class I and individuals who recover (the constant recovery rate is α) leave the infectious class and move to the recovered class. Equipped with initial conditions S(0), I(0), R(0), the model can be easily solved. Elaborations on the basic SIR model that are used in the study of Covid19 includes SIRD (eg. Anastassopoulou et al., 2020) and SEIR (eg. Wu et al., 2020 as well as Read et al. (2020) ). Since there are daily data available on the number of deaths, an additional class 'Deceased' (D) can be included as well, which corresponds to SIRD model. The other popular variant of baseline SIR is SEIR, which separately considers susceptibles (S) and exposed (E). Chudik et al. (2020) contrast government-mandated social distancing policies with voluntary self-isolation in an SEIR model. They decompose the population, N, into two categories: the exposed N E and the rest, N I , which are isolated. The strength of a mitigation policy can be measured by 1 − λ where λ = N E N , and it integrates social distancing policy in the traditional SIR model. They evaluate the costs and benefits of alternative societal decisions on the degree and the nature of government-mandated containment policies by considering alternative values of λ in conjunction with an employment loss elasticity α. They found that the employment loss can be reduced if the social distancing policy is targeted towards people that are most likely to spread the infection. Besides, other articles also discuss this model, see Wu et al. (2020) , Read et al. (2020) , and Peng et al. (2020) for details. One feature of Covid-19 is that different sub-populations face different risks. Taking that into account, Acemoglu et al. (2020) develop a heterogeneous agent multi-risk SIR model (MR-SIR) where infection, hospitalization and fatality rates vary between groups. Describing the laws of motion of the susceptible, infectious and recovered populations by group, they found that better social outcomes can be achieved with targeted policy that applies an aggressive lockdown on the oldest group instead of a uniform lockdown policy. However, these kinds of models have a high requirement on the data availability and quality, to which the estimates are very sensitive. One of the limitations of Covid-19 data is that the number of susceptible and recovered people are seldom reported. Apart from this reason, Li et al. (2020) point out the limitations of SIR style models, arguing that the real situation could be much more complicated and changing all the time. Another strand of time series analysis originated from Generalized Logistic Model (GLM), first proposed by Richards (1959) , developed for modelling growth curves. Allowing for more flexible S-shaped aggregation, the infection curve Y (t) evolves as where A and K are lower and upper asymptotes; B is the growth rate; v affects the maximum asymptote growth; and Q is related to the value Y (0). An extension of GML called sub-epidemic modelling was studied by Chowell et al. (2019) , where they model each group sub-epidemic by a GLM and then comprise a set of n overlapping sub-epidemics to describe the whole epidemic waves. Roosa et al. (2020) extend the aforementioned GML to fit the cumulative number of confirmed Covid-19 cases and forecast short-run new cases in Hubei and other provinces in China, and they found the S-shape curve fits the initial data well, based on the information until 9th Feb. Additionally, they also compare their model with the Richard curve and sub-epidemic modelling. All models provide good visual fits to the epidemic curves, and their estimates obtained from GLM consistently illustrate that the epidemic growth is nearly exponential in Hubei and sub-exponential in other provinces at early stages. Liu et al. (2020) consider a panel regression model for the infection growth rates. Like us, they suppose a deterministic trend, in their case a piecewise linear trend for the growth rates with breakpoint at the peak of the curve. They also explicitly include first order autocorrelation in the error term, which after quasi differencing introduces a lagged dependent variable into the mean equation. They assume a common value across countries for the autocorrelation, but allow the other parameters to vary across countries. They specify prior distributions for all parameters. For example, the prior for the common autocorrelation is a normal with mean 0.5 truncated to [0, 0.99]. They assume normality for the innovation process of the regression and independence of these error terms across country and independence of the random coefficients across countries. They provide Bayesianist inference and predictive density forecasts. This paper has some impressive results. However, it is not clear what the evidence base is for some of their choices, since they do not elucidate on them. In fact, we have found negative autocorrelations in the residuals from our mean model to be quite common (perhaps due to stale reporting in some countries, Lo and MacKinlay, 1990) . 4 Also, we find strong evidence of cross-sectional correlation between the error terms, which seems eminently plausible as countries like Belgium and Netherlands, say, share a lot common shocks, a point that has led to the developments in Hafner (2020) . Apart from GLM, several other time trend modelling are worth reviewing, some of which verifies the parabola time trend of daily new cases and deaths. Deb and Majumdar (2020) showed that a time-dependent quadratic trend successfully captures the log incidence pattern of the disease among Chinese provinces. Li et al. (2020) use time series modelling to fit the infections and fatalities in China, an apparent quadratic trend and turning points are found. They also demonstrate that the distribution of daily deaths is similar to that of daily infections with a 5 to 6 days delay. Furthermore, Flaxman et al. (2020) develop a semi-mechanistic Bayesian hierarchical model to accommodate the impact of government interventions on both daily new infections and fatalities among 11 European countries. Especially, most of their online plots based on data from European CDC present second-order polynomial time trend, which are consistent with the evidence from China. Moreover, Hafner (2020) considers both serial correlation and spillover effects of Covid-19 by spatial autoregressive models. Finally, Hu et al. (2020) apply artificial intelligence method for real-time forecasting of Covid-19 cases data. By using cumulative confirmed cases data until the end of February 2020, they predicted that the provinces/cities would enter plateaux eventually, although with varied time points. Trend modelling has been a major activity in econometrics. The class of nonstationary processes is extremely broad, and different types of nonstationarity can generate quite different behaviour and require quite different analytical techniques. There are two main approaches to depict the structure of nonstationary data. One is the unit root theory for integrated time series, or similar techniques for fractional integrated time series that covers unit root process as a special case. This theory and associated techniques are studied and developed by Park and Phillips (1999, 2001) , Marinucci and Robinson (2001) , Hualde et al. (2011 ), and Wang (2014 , 2015 , among others. The other is locally stationary processes and deterministic trends. In economics, many data series feature increasing trends that are quite close to linear. For epidemic data, the situation is more complex so that typically there is an upward exponential growth, a plateaux, and a declining trend part, all of which needs to be captured. We model the number of new cases and new deaths per day for each country or territory. For national health services, these are key quantities that determine peak resource needs. Let y it denote either the number of new cases or the number of new deaths in country i on day t. We sometimes add one to the count as this is necessary for some countries with sparse data records or at early stages of the epidemic when zero counts were common. We also sometimes work with the data normalized by population, i.e., (y it + 1) /n i , where n i is the population of country i and rescaled time. Division by population only affects the constant term in our regressions, but is done to aid comparability across countries. We suppose that for some monotonic transformation τ (such as the logarithm) where m i is the trend in the mean of the process x it , i.e., m i (t) = E(x it ), and the error term ε it is mean zero. We also define the mean of the original process y it , as M i (t) = E(y it ). We will allow all aspects of the model to vary across countries without restriction, but we sometimes drop the subscript i in the discussion for convenience. Although the data are counts, i.e., integer values, we do not impose this property; this can matter a bit at the beginning and at the end of the trajectory for small countries, but otherwise it does not seem to be necessary or helpful to impose some Poisson type process for large countries in the middle of the episode. We focus on the mean regression and we do not restrict the error process beyond the zero mean property, it may be autocorrelated and heteroskedastic within some limits. Nevertheless, statistical folklore suggests that the transformation can help in reducing heteroskedasticity and other issues that arise from the non-negative count nature of y, Box and Cox (1964) . The long run behaviour is fully determined by the trend m. We discuss both nonparametric and parametric (local and global) approaches to choosing m. Other models involve dynamic processes for y t ; one problem with this approach is the presence of measurement error in y, which in our framework is simply averaged out, whereas in a dynamic model some additional structure regarding the measurement error would have to be imposed to mitigate its effects on parameter estimation. 4 They work with three day moving averages of the raw data, which is likely to induce positive autocorrelation. We first discuss the transformation τ . We have considered the Box-Cox family of transforms τ v (y) = (y v − 1)/v, where v is an unknown parameter. In several of the web updates we provide a discussion of inference in this model and provide prediction based on estimating the parameter v. However, we found (see the website) that identification of the parameter v from short time series to be challenging, so in the sequel we suppose that v = 0 (i.e., τ is the logarithm of y or y + 1) is given. 5 We next discuss the trend function m. We consider polynomials in t and log(t). Suppose that where a, b, c are parameters to be determined. A quadratic is the simplest function that reflects the possibility of a turning point. A well defined maximum of m occurs if and only if c < 0 and occurs at the time τ max = −b/2c, which results in the maximal value of cases per day of m(τ max ) = a − b 2 /4c; finally, the value of t after which no cases would be reported (the end of the epidemic) is the larger root of m(t) = 0. A quadratic function can also be rewritten in vertex form so that m(t) = α + γ (t − µ) 2 , where µ = τ max and α = m(τ max ), and the parameter γ = c. The vertex form is good for interpretation since it is parameterized in terms of the quantities of interest (but it does require some nonlinear estimation, whereas the parameters of the standard form all have linear parameter effects, which makes it easier to implement). We consider a generalization of the vertex form of the quadratic model where α, γ , µ, λ are parameters to be determined. For any λ, a well defined maximum of m occurs if and only if γ < 0 and occurs at the time τ max = µ, which results in the maximal value of (log) cases per day of m(τ max ) = α. The parameter λ controls the shape of the curve around the peak. When λ > 2, this is called the leptobottomed case, that is, the peak is elongated relative to the λ = 2 case. When λ = 1, the platybottomed case, the peak is sharper than the λ = 2 case. The parameter γ measures the speed of approach to the peak and decline from the peak. We look at values λ = 2 and λ = 4 primarily. In the case λ = 2, Eqs. (2) and (3) are in one to one relation. The case λ = 4 is a special case of a quartic polynomial. For any λ this model is symmetric about the peak, which enables prediction from before the peak is achieved, although it is restrictive. We have also worked with the mixed polynomial logarithmic functional form, that is, which is consistent with the typical unimodal trajectory under some restrictions. It suffices that b > 0 and c > 0, in which case the peak is located at time c/b. The curve can be reparameterized as m(t) = α − βt + βµ ln(t), where t = µ is the peak time. This functional form can generate asymmetric behaviour around the peak. The UK government promoted this type of curve, perhaps as a way of encouraging ''prudent'' behaviour during the lockdown easing process, UK gov (2020). Our model is for the transformed value of cases, the implied model for the cases themselves is obtained by taking the inverse transformation. Specifically, for the general model (1) we obtain If the distribution of ε t is time invariant and τ is increasing, then the maximum of M and m are achieved at the same time. In the logarithmic special case, M(t) = exp(m(t))κ 0 , where κ 0 = E [exp(ε t )]. If ε t were N(0, σ 2 ), then κ 0 = exp(σ 2 /2) and for small σ 2 we have κ 0 ≃ 1 + σ 2 /2, and macroeconomists refer to this as the Jensen's inequality term, and often drop it, (Campbell, 1993) . We do not specify any distributional shape for ε t , since we do not need to, and we account for the presence of the stochastic error term in our predictions by estimating κ 0 . In the case where τ is logarithmic, when λ = 2 the model m is quadratic in time and the implied model for M(t) is proportional to a Gaussian density. When λ = 4, the implied model for M(t) is proportional to a ''fat bottomed'' density with a flatter peak and faster curvature before and after the peak. The mixed functional form (4) for m implies a shape for M(t) proportional to a gamma density, i.e., M(t) ∝ t c+1 exp(−bt). The curves m(.), M(.) themselves are of interest and follow directly once the parameters are known or estimated. Parameters of interest are the time τ max = µ where the maximum of m and M occurs and the maximal value of (log) cases per day, m(τ max ), which in the model (3) is α, since this is embedded in the vertex form. Transforming back to cases we obtain for that model 5 We use natural logarithms and base 10 logarithms interchangeably, where the latter is useful for interpretation of graphs, while the former has a more convenient notation. which is the maximum of M(t). For the log and linear curve (4), the peak is located at τ max = −b/d = µ and m(τ max ) = α + βµ(log(µ) − 1) and y max = exp(α − βµ)µ βµ E [exp(ε t )]. The peak time could be in the future relative to the estimation window or could already have past. We may be interested in the expected first passage time to some trough level from the peak. For example, suppose that trough is defined as the expected peak number of new cases divided by some number L. Then For the quadratic model, this is equivalent to finding the first point t > τ max for which (t − µ) ≥ √ − log L/γ . We are also interested in the total number of cases that would occur in a given country, which is We work with rescaled time below, that is, t ↦ → t/K for some large K . In this framework we can replace the sum by an integral and obtain a closed form in terms of the parameters. In the case λ = 2, this is approximately and in the case λ = 4 Finally, for the (4) family we have The reproduction number, also known as R or R 0 , is the average number of people that one person with an infectious disease will likely infect in the future. It measures how fast the epidemic will increase over time. In the context of our model, we may define R 0 as m ′ (t) + 1; this has the property that R 0 starts out above one and declines to one at the peak and then declines further towards zero as the episode ends. (it measures the rate of growth of the epidemic) We suppose that time is relative to December 31st 2019, which is the putative starting point of the epidemic. However, the first time point t min for a given country may be some time after this. The last time point t max we hope is finite, so that y t = 0 and log(y t + 1) = 0 for all t < t min and t > t max . It follows that there are a finite, albeit potentially quite large, number of time periods with information about m. We may use the traditional long horizon setting subject to this limitation. Instead, we use an infill asymptotic scheme. We suppose that time is rescaled according to the number of observations being used. That is, we if we label today as time t 1 , and if use we use the K most recent observations, we suppose they fall in the interval [t 0 , t 1 ] for some t 0 , so that observation times are t 0 + (t 1 − t 0 )/K , . . . , t 1 (with an abuse of notation we say t = 1, . . . , K ). As K → ∞ our observations fill up the interval [t 0 , t 1 ]. The data generating process is consequently a triangular array indexed by K . The model however is assumed to operate over all time, i.e., from −∞ to ∞ (or from 0 to +∞ for the model containing a logarithmic trend). We note that some parameters are relative to the coordinate system [t 0 , t 1 ] induced by K , and we may want to convert them back to calendar time relative to December 31st 2019 or today), which we do by linear transformation. If we interpret the model as parametric and true for all data (the model is global), the model can extrapolate both forward and backward. Consider the nonparametric point of view. In this case the function m(.) is not specified except that it is a smooth function of (rescaled time), and our quadratic regressions can be interpreted as approximations (the model is local). In this case we take the one-sided estimation window of size K to be a small fraction of the total available observations T . In this case, the interpretation is that we are estimating the level of the regression function at the point u ∈ [t 0 , t 1 ] using data from previous periods and a local quadratic fit. For the local quadratic regression (Fan and Gijbels, 1996; Gozalo and Linton, 2000) , where h is a bandwidth parameter (1/K ). This suggests that the γ parameter may be hard to estimate since it depends on the second derivative of the regression function and also is scaled by a small quantity. In this case, the model itself has limited extrapolative properties based on the assumption of smoothness. Our estimation methodology is very simple. We work with daily data that is available for all countries since December 31st 2019. Each country in the database has a day of first case t c min and day of first death t d min ; these vary by country with China having its first recorded cases (as far as this database is concerned) on December 31st, but other countries enter into the fray subsequently. The current time period is denoted t 1 and so we have data {y c it , y d it , t = 1, . . . , K , i = 1, . . . , n}, which are count data with many zeros at the beginning. 6 6 For the model (4), t must be positive and so we define time from 1 to T . 1. To estimate (3) we fix λ = 2 or λ = 4 and estimate the parameters α, γ for given µ by OLS of the log of counts plus one (log(y + 1)) for each country using the estimation window data, which contains the most recent K datapoints with rescaled time (t/K ). 7 We then search for the minimum squared error across different µ. For (4) we estimate by OLS directly. We allow country specific case and country specific death values for all parameters. We also include day of the week dummy variables in each regression. We provide standard errors for the parameter estimates based on NLLS theory, which is detailed in the Appendix. We use the LS standard errors for simplicity, given the small sample size we have for each country the HAC-based standard errors can be subject to a lot of noise. We comment on the residual properties below. 2. We then extrapolate the estimated regression curves outside the estimation window and take exponentials to deliver predictions of the number of new cases and new deaths per day. That is, we calculatem(s) andM(s) = exp(m(s))κ 0 for any s from −∞ to +∞, 8 whereκ 0 = ∑ K t=1 exp(ε t )/K andε t = log y t −m(t) are the least squares residuals. We forecast y s for ANY TIME s byM(s). This is an asymptotically, i.e., as the estimation error disappears 3. We provide frequentist prediction intervals for M(s) for any fixed s, whereq α is the α-quantile of the residuals {ε t , t = 1, . . . , K }. As the estimation error disappears (K → ∞), this interval will contain M(s) with probability converging to 1 − α. 4. We provide a forecast of the total number of cases: the total number of cases so far plus ∑ s>todayM (s). Alternatively, we use the formula for the total number of cases from the model m, which is approximately given by (7), (8), and (9) 5. Selection of K . Our choice of K is often limited by data availability. We investigate in Section 9 a specific algorithm for choosing K . When viewed as a parametric model, the choice of specification is crucial to obtain consistent estimates of parameters. However, there is one robustness result that does hold, namely, in the vertex model with λ 0 > 0, provided there is data on either side of the peak, our estimation procedures will robustly estimate the location of the peak µ for other values of λ, although other parameters will not be consistently estimated. Regarding the prediction intervals, a full disclosure. These are intended to assist the reader in gauging the fundamental uncertainty that would be present were the parameter values known and the model were true. The reality is that neither of these conditions are met. If one takes account of parameter uncertainty, then the prediction intervals expand rapidly with horizon. Technically, the parameter uncertainty overwhelms the prediction uncertainty when the horizon is greater than the square root of estimation sample size. We would argue that this is true of any model in this setting unless one believes in Bayesian magic. We discuss short term prediction intervals that take account of parameter uncertainty in more detail in the Appendix. We use daily data on new cases and fatalities downloaded from the website of the European Centre for Disease Protection and Control (ECDC), which is an agency of the European Union. According to that website, the first case worldwide was recorded as December 31st 2019 (day 1). We have the daily number of (new) cases and the number of (new) deaths up to today's date, which is 180+ days at the time of writing since day 1. These are count data with some zeros initially but the counts get quite big quite quickly for the major countries. We consider the 191+ countries and entities in the ECDC dataset but report separately only the thirty countries with the largest number of cases (excluding China) and with at least K days of data. Specifically, cases are the reported daily count of detected and laboratory (and sometimes, depending on the country reporting them and the criteria adopted at the time, also clinically) confirmed positive and sometimes -depending on the country reporting standards -also presumptive, suspect, or probable cases of detected infection. The size of the gap between detected (whether confirmed, suspect or probable) and reported cases versus actual cases will depend on the number of tests performed and on the country's transparency in reporting. Most estimates have put the number of undetected cases at several multiples of detected cases. There are a number of reporting issues. Some of these include official governmental channels changing or retracting figures, or publishing contradictory data on different official outlets. National or State figures with old or incomplete data compared to regional, local (counties, in the US) government's reports is the norm. We consider the thirty countries with the largest number of cases (excluding China) and with at least K days of data. We re-estimate every day as new data comes in. We fully expect the parameters to change over time and to vary across country, and they do. We fit the models (3) and (4) on each country's case and death data separately with the most recent K datapoints (and using rescaled time with estimation window [t 0 , t 1 ]) and report the most recent results below in: Table 1a (cases, λ = 2), Table 1b (cases, λ = 4), Table 1c (cases, (4)), Table 2a (fatalities, λ = 2) and Table 2b (fatalities, λ = 4) and Table 1c (fatalities, (4)). We use K =100 here. The regressions generally have high R 2 (not reported). There is quite a bit of heterogeneity across the parameters consistent with different countries being at different stages of the cycle and having taken different approaches to managing the epidemic and having different demographics. By now most countries have µ significantly in the past, which means their peak has passed, but a number of countries still have yet to reach their peak. The γ parameters are mostly negative but the standard errors are quite wide, and in some cases the confidence intervals around these estimates include zero. The y max parameter is quite well estimated for most countries with some exceptions. Most countries do not exhibit strong seasonal effects, although generally Monday and Tuesday seem to report lower cases and deaths than other days of the week (see Table 2c ). We provide likelihood ratio test statistics ℓ that can be used to test the difference between the three models; the critical value for the likelihood ratio test is χ 2 0.95 = 3.84. In the past there was not much to choose between the models, for most countries. Currently, the US strongly prefers the fat-bottomed model, Brazil favours the quadratic regression, whereas the UK favours the model (4). The table below gives log(case t µ is the estimated day of the peak, taking 201-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. The table below gives log(case t µ is the estimated day of the peak, taking 2020-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. The table below gives log(case t µ is the estimated day of the peak, taking 2019-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. µ is the estimated day of the peak, taking 201-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. The table below gives log(death t + 1) = α + γ |t − µ| 4 + ∑ 6 d=1 β d D d . µ is the estimated day of the peak, taking 2020-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. The table below gives log(death t + 1) = a + bt + c log(t) + ∑ 6 d=1 β d D d . µ is the estimated day of the peak, taking 2019-12-31 as day 0. The number of * at the last column denotes the relative goodness of fit based on log likelihood, where *** indicates the model of the best fit among others. We present the estimated curves along with their 95% prediction intervals for selected countries in the graphs below. The estimated curve in solid blue, confidence intervals in dotted blue, and data points in red. The extrapolation curve is a scaled density function as mandated by our curve models (see Figs. 1-3) . We estimate the time that it has taken for selected countries to pass from peak to trough, where trough is defined as the estimated peak number of new cases divided by 10. Admittedly, this is not a very exacting standard, but the advantage is that there are several countries that have effectively satisfied this. An empirical estimator is just based on the first passage timet log 10 (y τmax ) − log 10 (y t ) ≥ 1 } , where log 10 (10) ≃ 1 and τ max = arg max s log 10 (y s ). For New Zealand, Australia, and Austria,t TP = 14, for South Koreâ t TP = 15, for Chinat TP = 19, for Norwayt TP = 21, for Israelt TP = 21, while for Switzerlandt TP = 31. These estimates are rather rough and in each case there was some bounce back. Incidentally, looking at these successful countries' trajectories, there seems to be a fairly symmetric curve around the peak, see below the case of New Zealand. We can also estimate the passage time from the model for countries that have not yet passed the threshold by computing This is equivalent to (t − µ) ≥ √ −1/γ . For the UK, this gives around 28 days based on the latest estimates, although the confidence interval is rather wide. The choice of K. This section illustrates the selection of the window width K . Let today (2020-06-26) be the day 0, K ranges from 14 to 100 and L is the length of the test data. Due to the missing data, we only list 27 out of top 30 countries. We compute the one step ahead forecastm(1 − l; K ) of log y 1−l based on the data from {−l, −l − 1, . . . , −l − K } for l = 1, 2, . . . , L and calculate: And we show the empirical results for both daily infections and fatalities below. Although K =100 is not the best choice for all countries with top cases, it still works well for most of them. One Step Ahead Prediction Loss of Cases. Name 1a 1b 1c 1a 1b 1c Saudi Arabia 77 95 86 302 745 594 South Africa 346 297 371 364 3837 397 Canada 32 21 27 64 65 44 Qatar 163 160 221 330 1110 613 Colombia 917 681 875 1248 1595 1194 Sweden 797 787 798 1310 1368 1281 Egypt 361 554 420 1256 767 1310 Belgium 86 75 88 91 95 90 Belarus 196 209 203 257 1132 443 Argentina 1099 1067 1108 1680 2023 1643 Indonesia 47 62 46 155 213 155 Netherlands 6 14 6 13 14 14 UAE 40 76 43 127 633 170 This table summarizes the average loss of one step prediction for new daily cases when L = 5. And K ranges from 14 to 100 days. The second column presents the minimum average loss of 5 days within that range. And the corresponding results of K = 100 are provided for comparisons. Brazil 48 70 58 225 89 120 Russia 16 19 20 22 60 31 India 27 26 28 57 53 43 UK 43 29 45 74 75 72 Peru 86 80 84 102 82 89 Chile 34 48 35 76 155 69 Italy 18 15 18 18 17 18 Iran 13 3 20 19 20 26 Mexico 240 268 245 344 284 288 France 8 6 8 12 9 11 Pakistan 27 28 30 47 33 50 Turkey 2 3 2 15 12 15 Germany 4 4 4 8 8 7 Saudi Arabia 2 2 2 5 12 4 South Africa 20 14 20 32 110 32 Canada 7 11 9 7 14 9 Qatar 2 1 2 2 2 2 Colombia 31 31 39 49 58 49 Sweden 13 13 14 24 25 23 Egypt 38 33 38 45 58 47 Belgium 5 5 5 5 5 5 Belarus 1 1 1 1 1 1 Argentina 9 8 9 11 13 11 Indonesia 4 3 4 4 6 4 Netherlands 1 1 1 1 1 1 UAE 0 0 0 1 1 1 This table summarizes the average loss of one step prediction for new daily deaths when L=5. And K ranges from 14 to 100 days. The second column presents the minimum average loss of 5 days within that range. And the corresponding results of K = 100 are provided for comparisons. This table presents the results of one step ahead prediction of new cases for 2020-06-26 along with the reported values and t-statistics of the prediction. All three models are applied. K = 100 and τ PY is the joint prediction test statistics of all listed countries as Pesaran and Yamagata (2012) , see Appendix. the reported values and t-statistics of the prediction. All three models are applied. K = 100 and τ PY is the joint prediction test statistics of all listed countries as Pesaran and Yamagata (2012) , see Appendix. In this section we look at the properties of the residuals from the trend fitting. The model assumptions do not exclude autocorrelation or heteroskedasticity but there is limited scope to improve efficiency by exploiting these properties. They would however affect standard errors and might suggest alternative short term predictors (see Tables 5 and 6 ). Autocorrelation. We first estimate the autocorrelation function of {ε it , t = 1, 2, . . . , K } denotedρ i (j), j = 1, 2, . . . , J for all the available countries that have at least 100 observations, currently, n = 150 countries. We take J = 21 to allow for long lag effects, which might be predicted for deaths by epidemiological models. We calculated the estimated mean value ofρ i (j) across countries along with standard errors that take account of the cross-sectional averaging but allow for cross sectional correlation as Linton (2020). The pattern of autocorrelation is similar in both cases and deaths for all three models, the weak positive correlation at low lags that declines across horizon to negative autocorrelation after around ten days, and then remains a very low level closed to zero. Distributional properties. We next show the kernel density estimate of the pooled standardized residuals of both cases and death of all three models, which appears not far from a Gaussian shape, at least, roughly symmetrical. Cross-sectional correlation. We analysis cross-sectional correlation by computing the n × n pairwise correlation matrix of time series residuals. We compare the plot of the pairwise correlation density and normal distribution. For cases, the mean value of the pairwise correlation is very close to zero and distributed similarly to normal distribution for all three models. However, for deaths, the mean value is slightly positive, such as 0.18 for the quadratic model (2a), 0.09 for the quartic model (2b) and 0.13 for the Gamma model (2c). Heteroskedasticity. We look at time-varying heteroskedasticity in the residuals, specifically we graph the time series of mean squared residuals ∑ n i=1ε 2 it /n. It seems that there are limited spikes for residuals of cases models while more outliers of deaths models in the cross country variability of the error terms. However, it moves, generally speaking, in a modest range. We show the results of all four residual properties of the quadratic model of cases (1a) in Fig. 4 and deaths (1b) in Fig. 5 . Total fatalities should be a fraction of the total cases reported, and fatalities should follow cases, at least individually. For this reason we consider the following model, which imposes that the fatality curve is a delayed and shifted (because this is the log of cases) version of the case curve. Let y d it denote deaths and y c it denote cases, where: log which is a special case of the model considered by Hardle and Marron (1990) . This imposes restrictions across the coefficients of the two quadratic equations. The turning point for m d occurs k periods after the turning point for m c , that is, α d = α c + θ and µ d = µ c + k. The only equality restriction is that the γ parameter is the same across both cases and deaths. We can test this by comparing the statistiĉ with the standard normal critical values. Regarding the inequality restrictions α d − α c < 0 and µ d − µ c > 0, these can also be tested separately by similar t-statistics with one-sided critical values. Specifically, we consider The tests results are presented below in two tables. The model does not fare well on either count (although the significance mostly disappears when HAC standard errors are used) We estimate the constrained model as described in the Appendix, the results are shown below graphically. For the UK the results look quite plausible (see Tables 7 and 8 ). We calculate the ratio of expected total deaths to expected total cases (the fatality ratio) by country as ) . At the time of writing, UK has 0.136, France has 0.170, Italy has 0.132, Spain has 0.101, Germany has 0.045, Australia has 0.052, and USA has 0.051. This may be saying more about the disparity between countries in testing rather than the quality of treatment. Epidemiological models often build in the effect of lagged cases on deaths, reflecting the natural causal ordering of cases on deaths. We consider an alternative model that combines that feature with our quadratic regression. Suppose that the log death regression is a weighted delay of the log cases regression In principle, if m c , m d were both nonparametric and estimable one might be able to estimate the function w by sieve expansion, using the methods described in inter alia (Chen, 2007) and (Chen and Christensen, 2015) . We take a parametric approach where both m functions are quadratic. This is possible provided We suppose that w is a constant ψ times the density of a normal random variable whose mean is ϑ and whose variance is one. This shape seems plausible since it implies a peak loading some days previous. We need the constants ϕ j = ∫ ∞ 0 s j w s ds. Suppose that ϕ j (θ ), where θ = (ψ , ϑ) ∈ R 2 ; these constants are obtainable in closed form as integrals of normal densities, (Barr and Sherrill, 1999) : where φ, Φ are the density function and c.d.f. of a standard normal random variable. This expresses the parameters of the death curve uniquely as a function of the parameters of the case curve and the two parameters ψ, ϑ. In particular, the location of the death peak is related to the case peak as follows Since the Heckman correction λ(ϑ) ≥ 0, this implies that µ d ≥ µ c as for the previous model. This equation allows the determination of ϑ because the function λ(ϑ) = φ(ϑ)/Φ(ϑ) > 0 is monotonic decreasing. That is, ϑ = λ −1 (µ d − µ c ). Whence, ψ = c d /ϕ 0 (ϑ )c c . It follows that the quadratic model for both cases and deaths is compatible with the posited relation between deaths and cases. Regarding the peak of the two curves, we have By the Cauchy-Schwarz inequality, ϕ 2 − ϕ 2 1 ϕ 0 ≥ 0 so that with negative c c we have m d max ≤ ψϕ 0 m c max . This model imposes exactly one restriction like the shape invariant model but the nature of the restrictions are quite different. We plan to investigate this model in the sequel. There are many challenges in modelling the COVID data. Countries differ widely in their reporting methods and standards, which makes the data noisy and sometimes very unreliable. Our model does not impose any restrictions across countries for the evolution of the epidemic and we find there are very large differences across countries in all the key parameters. Another complication is due to different interventions implemented in different countries at different times. Actually, most European countries introduced lockdown measures within the month of March 2020 so that whatever effects these measures have had will be fully reflected in the subsequent data. Our model allows us to estimate future turning points of the curves without imposing much structure. It also allows the forecasting of total number of cases, although the confidence intervals around such estimates are extremely large when proper attention is given to parameter uncertainty. The one day ahead forecasting record of our model has been quite good, discounting the problems arising from late and incorrect reporting of data. The main source of forecasting error was late reporting. On the other hand, longer term forecasting has proved more challenging. We have consistently underestimated the likely total number of cases and deaths for the UK and US. We have some specific findings from the empirical work that is worth commenting. First, the shapes of the curves appear to be quite different across countries. Second, the timing of peak deaths in some countries precedes the peak of cases, which seems to be against the epidemiological models. This may be because testing capacity has increased a lot and treatment has improved. Third, the characteristics of endgame countries vary a lot: being a small island seems to help, but countries as diverse as Luxembourg, China, South Korea, Israel, Switzerland have also reached the endgame relatively quickly. There are also differences across countries in terms of the ratios of deaths to cases. Part of this is due to how much testing is done (USA) and different death definitions (Russia), but there must be more factors at play such as demographics, lifestyle choices and social norms. The models we have considered impose a single peak, although when we use a rolling window analysis, a second peak will be detected from the updated sample. We may allow for multiple peaks in a global model by taking higher order polynomial time trends. For example, quartic polynomials allow two peaks. We may parameterize this explicitly as the double peak model has twin peaks at µ 1 and µ 2 with the difference in peak heights given by γ δ (µ 2 − µ 1 ) 2 . Using this model we have found that South Korea and China and Portugal show signs of a second peak, although the height of this second peak is an order of magnitude lower than the first round. In all the datasources we have worked with there are sometimes clearly ''erroneous'' data, such as negative values. Health agencies sometimes report negative values to correct earlier mistakes, and they often do not break down the corrections by date rather just post a single correction. For example, on 21/05/2020 the UK posted -515 new cases, while on 03/07/2020 they posted -29726. This was accompanied by the statement We have updated the methodology of reporting positive cases, to remove duplicates within and across pillars 1 and 2, to ensure that a person who tests positive is only counted once. Methodologies between nations differ and we will be making future revisions to align approaches as much as possible across the 4 nations. Due to this change, and a revision of historical data in pillar 1, the cumulative total for positive cases is 30,302 lower than if you added the daily figure to yesterday's total. We will revise the methodology note explaining this in more detail in due course We take the following steps to adjust our estimation for these data issues. Suppose that at dates t 1 , . . . , t r , the outcome variable is negative for country i. We then exclude y it 1 , . . . , y itr and estimate the model using the data excluding these observations. We then impute the missing observations bŷ y its = exp(m(t s ))κ 0 . Then we take the original negative values and the imputed ones y its −ŷ its and redistribute them to all the observations {y it , t ≤ t s } equally. Finally we reestimate the model with the full sample reflecting the level shift(s). We have also carried out quantile regressions instead, which provides some robustness to missrecording. The results are quite similar to the least squares ones. Here, we consider standard errors for a regression function of the form m(t) = α + γ |t − µ| λ + ∑ 6 j=1 β j D jt , where µ is the location of the peak and λ > 1 is a known parameter that measures the type of peak, while D jt are day of the week dummies. We let θ = (α, γ , µ, β ⊺ ) ⊺ , where β = (β 1 , . . . , β 6 ) ⊺ , and letθ minimize In fact we solve this in two steps: for given µ, (α, γ , β ⊺ ) ⊺ can be found by closed form OLS estimation, we then do grid search over µ. By standard theory for profile likelihood, this procedure is equivalent to the MLE. The score function components are where D t = (D 1t , . . . , D 6t ) ⊺ . We can drop the constant term λγ from the score for µ (assuming that γ ̸ = 0). Define the K × 9 data matrix It follows that (under iid error assumptions) as K → ∞ where σ 2 ε is the error variance. Standard errors are computed aŝ σ ε sqrt(diag(M −1 )), whereσ 2 ε is the residual variance. We use these standard errors for the quadratic and quartic vertex models; for the log model we just use OLS standard errors. The Gaussian log likelihood is To compare across different λ we can consider the likelihood ratio statistic, ℓ λ − ℓ 2 , where ℓ λ = K log (σ ε (λ)) , which should be approximately χ 2 with one degree of freedom under the null hypothesis that the true λ = 2. We discuss here our approach to obtaining prediction intervals. We have a classical linear regression with K observations: We have y K +s − y K +s|K = (θ − θ) ⊺ x K +s + ε K +s ≃ N ( 0, σ 2 ε x ⊺ K +s (X ⊺ X ) −1 x K +s ) + ε K +s , where we assume that the two random variables are independent. As K → ∞, the parameter uncertainty is small for given s, but as s → ∞, the parameter uncertainty grows without bound (in our case since x K +s = (1, 1 + s/K , (1 + s/K ) 2 ) ⊺ ). In fact the estimator of m is only consistent in the range where s 2 /K → 0. Our intervals, or any intervals, are only valid over short horizons without strong additional assumptions or Bayesian magic. In our graphs above we have ignored the contribution from estimation uncertainty, which mostly affects long term prediction intervals, and affects them dramatically. A simple approach is to assume normality for ε K +s , in which case ε K +s|K = y K +s − y K +s|K ∼ N ( 0, σ 2 ε (1 + x ⊺ K +s (X ⊺ X ) −1 x K +s )). We provide a test of the vector of one step ahead predictions for n = 30 countries. We have for the vector of one step ahead predictions where Ω ε is the error covariance matrix, so that is approximately standard normal for each i. Since n > K , we use a version of the Pesaran and Yamagata (2012) statistic to aggregate across countries which is asymptotically standard normal under the null (provided n is large and the cross-sectional dependence is weak); the test is rejected if τ PY > z 1−α , where Φ(z a ) = a. Here, ⊙ denotes Hadamard product andΩ ε is the residual covariance matrix estimatê These are the statistics reported in Tables 3 and 4 . We next discuss how to estimate the restricted model of Section 5. In this case we have a pair of quadratic equations with an equality restriction, log ( y c = Ω. This is a SUR with a cross-equation restriction and the optimal estimator is a GLS. The unrestricted estimator vectorθ = (α c ,β c ,γ c ,α d ,β d ,γ d ) ⊺ has variance Ω ⊗ (X ⊺ X ) −1 . We define θ = (α c , β c , α d , β d , γ ) ⊺ and L the 6 × 5 matrix of zeros and ones that picks out the right element In our second model, the quasi likelihood can be used with one step taken from initial consistent estimators of θ = (a c , b c , c c , ψ, ϑ) ⊺ , where, with ϕ 0 (ϑ ) = Φ(ϑ), ϕ 1 (ϑ ) = φ(ϑ), and ϕ 2 (ϑ ) = 1 2 (1 − F χ 2 (3) (ϑ 2 )), we have m d t = ψ ( a c ϕ 0 (ϑ ) − b c ϕ 1 (ϑ ) + c c ϕ 2 (ϑ ) ) + tψ ( b c ϕ 0 (ϑ ) − 2c c ϕ 1 (ϑ ) ) + t 2 ψc c ϕ 0 (ϑ ). A Multi-Risk SIR Model with Optimally Targeted Lockdown Data-based analysis, modelling and forecasting of the COVID-19 outbreak Mean and variance of truncated normal distributions Intertemporal asset pricing without consumption data Large sample sieve estimation of semi-nonparametric models Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions A novel sub-epidemic modeling framework for short-term forecasting epidemic waves Voluntary and Mandatory Social Distancing: evidence on Covid-19 Exposure Rates from Chinese Provinces and Selected Countries A time series method to analyze incidence pattern and estimate reproduction number of covid-19 Local nonlinear least squares: Using parametric information in nonparametric regression The Spread of the Covid-19 Pandemic in Time and Space Semiparametric comparison of regression curves Artificial intelligence forecasting of covid-19 in china GaussIan pseudo-maximum likelihood estimation of fractional time series models Trend and forecasting of the COVID-19 outbreak in China Estimating the daily trend in the size of COVID-19 infected population in wuhan. medRxiv. Linton, O.B., 2020. The Models and Methods of Financial Econometrics Panel Forecasts of Country-Level Covid-19 Infections An econometric analysis of nonsynchronous trading Semiparametric fractional cointegration analysis Asymptotics for nonlinear transformations of integrated time series Epidemic analysis of COVID-19 in China by dynamical modeling Testing CAPM with a large number of assets Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions A flexible growth function for empirical use Real-time forecasts of the COVID-19 epidemic in China from Martingale limit theorem revisited and nonlinear cointegrating regression Limit Theorems for Nonlinear Cointegrating Regression Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Average Loss of K = 100 Name 1a 1b 1c 1a 1b 1c USA 6463 9480 5186 16247 14250 16755 Brazil 3263 2395 2753 6139 5737 4130 Russia 103 845 120 2440 2757 1499 India 2761 1004 2612 5348 4902 4529 UK 109 93 109 292 258 267 Peru 1553 1800 1583 1677 2968 1885 Chile 1115 949 1100 3606 1143 3840 Italy 94 67 98 139 165 140 Iran 38 100 46 1102 1086 1014 Mexico 956 547 872 1448 1072 1050 France 207 202 211 207 304 213 Pakistan 706 1012 825 4907 3081 5554 Turkey 115 64 137 679 479 717 Germany 74 50 63 210 110 246 (continued on next page) This is an SUR system with a nonlinear cross equation restriction. The efficient estimator should use the error covariance matrix.We can alternatively estimate the unrestricted model (a c , b c , c c , a d , b d , c d ) ⊺ and then impose the restrictions afterwardsfor some norm. This is similar to the case we have already studied because there are five unrestricted parameters and 6 estimable quantities.