key: cord-0950958-xrv529bt authors: Goliński, Adam; Spencer, Peter title: Modeling the Covid‐19 epidemic using time series econometrics date: 2021-08-24 journal: Health Econ DOI: 10.1002/hec.4413 sha: 291330de0ac460a242d987d070b9c1207804349e doc_id: 950958 cord_uid: xrv529bt The classic “logistic” model has provided a realistic model of the behaviour of Covid‐19 in China and many East Asian countries. Once these countries passed the peak, the daily case count fell back, mirroring its initial climb in a symmetric way, just as the classic model predicts. However, in Italy and Spain and most other Western countries, the first wave of the epidemic was very different. The daily count fell back gradually from the peak but remained stubbornly high. The reason for the divergence from the classical model remain unclear. We take an empirical stance on this issue and develop a model framework based upon the statistical characteristics of the time series. With the possible exception of China, the workhorse logistic model is decisively rejected against more flexible alternatives. models can also be used to check the properties of the theoretical models and match them better to the data (Meenagh et al., 2009) . Although epidemiological models may differ in many respects, they are all based on the same underlying theory and invariably predict that the daily counts for infections and deaths follow a bell-shaped curve. In other words, once the peak is passed and the daily count begins to fall, it follows a path that mirrors the upward climb, before coming to an end. In the large mechanistic models, this classic pattern follows from the dynamics of the epidemic, which naturally slows as the disease runs through the population and immunity increases. Precautionary behaviour on the part of the public and policymakers are also likely to be important in slowing the spread of the disease. In the phenomenological models, the daily count follows a bell-shaped path by assumption and means that the cumulative count follows a sideways S shaped logistic curve. These symmetric dynamics have provided a reliable way of modelling outbreaks of influenza and other epidemics in the past. Indeed, simple regression models based on fitting the bell (or logistic) curve to the data, also proved accurate in predicting the path of the Covid-19 outbreak in China and many East Asian countries (Batista, 2020; Jia et al., 2020) . However, the experience of Italy and Spain, which was followed by the US, the UK and many other countries, has been very different. The daily mortality figures have fallen back gradually from the peak in these countries, but have remained stubbornly high. This contrast is apparent in the daily infections series plotted for China and for Italy in Figure 1 . A positive skew in the national time series can appear because they aggregate data for areas that are hit by the virus at different times. However, data for hospital admissions and fatalities in hard-hit regions and cities like Lombardy in Italy and New York in the US also exhibit a pronounced skew. 1 A positive skew in the infections data may reflect measurement problems, such as improvements in the testing regime. It could also be due to the non-normality of community transmission. For example, the number of transmissions per person is known to have a long tail, due to the presence of "superspreaders". It has also been suggested that delays, such as the infection period, may have a gamma distribution with a long tail (Shen et al., 2020) . Another plausible reason for the skew in mortality data is that the length of time from infection to death or recovery follows a gamma distribution (Bird, 2013; Hogg & Craig, 1978) . Some people recover or regrettably die very quickly, but others take much longer. This distribution is used by the MRC Biostatistics Unit to infer the true number of infections and the reproduction number (R) from mortality figures for the UK regions (Seaman & De Angelis, 2020) . There is also a positive skew in the reporting lag (Bird & Neilsen, 2020; Birrell et al., 2020) . Whatever the reason for this asymmetry, it is clear that the classic model has failed us badly this time. This has been documented by several recent studies. For example, Marchant et al. (2020) show that IHME forecasts are usually overtaken by the data within a few days. We show that the classical model also fails in many other countries. Instead of trying to delve deeper into the data to try and find the reasons for this failure, we take an empirical stance and develop a model that is based upon the statistical characteristics of the national time-series. We model the daily mortality data for the first wave in 15 countries published by the European Centre for Disease Control (ECDC). We use these data rather than infections because of the acute public and policy interest focus on these data and because they are less prone to measurement error. We use the tools developed by econometricians to handle non-standard time-series, representing economic growth and speculative bubbles in financial markets for example, to analyze the dynamics of an epidemic. We compare the performance of the classical logistic model with flexible models based on the gamma and beta functions using an expanding data sample designed to mimic the data available to policymakers as an epidemic evolves in real time. At each stage, we identify the best fitting model using the Schwartz Information Criterion (SIC) and test forecasting performance using four-week projections. This exercise identifies three distinct phases. The logistic model systematically under-forecasts over all three phases, but initially, in the upswing phase of the epidemic, its in-sample fit compares reasonably well with its rivals. But after a month or so, as the peak is passed and any asymmetries become apparent, the gamma and beta models fit much better than the logistic. Because they have greater flexibility to handle the initial stages of the epidemic, this is the case even in countries like Germany and Denmark that do not exhibit much of a skew. The gamma model is more robust than the beta in the face of data irregularities during this phase and generates more reliable forecasts. In the final phase, typically after a couple of months, the end-wave features become apparent. In most countries, mortality rates fall close to zero and the beta model with its distinct cut-off feature performs better than the gamma in terms of both in-sample fit and post-sample forecast accuracy. However, in the US, Brazil and Portugal, which move into the second wave without much of a hiatus, the gamma model still outperforms the beta over the full sample. The next section of the paper sets out the three theoretical models and Section 3 explains how these are fitted to the data. The results are presented in Section 4. Section 5 offers some final observations and suggestions for future research. Econometricians are used to dealing with difficult economic and financial times series. Their data often violate the classical assumptions adopted in the statistical texts and thus need to be handled using special techniques. For example, macroeconomic data like GDP exhibit exponential growth and financial prices can exhibit speculative bubbles that are explosive. They may respond with long and variable lags to policy interventions and exogenous shocks. These data may be measured with error and subject to structural shifts as behaviour or government policies change. As Castle et al. (2020) argue, epidemiological data are fraught with similar problems. These econometricians have used sophisticated linear trend fitting techniques to decompose the cumulative death counts. They split each series into trend and remainder terms, then project them forward and recombine them to produce a forecast for the following week. As they note, a significant fall in outcomes relative to extrapolations from such models can be an indication that policy interventions are having the desired effect. Epidemiologists use similar models to separate the noise from the trend in the time-series and use the trend to estimate the reproduction number R, the number of people an infected person is likely to infect. However, deviations from a linear trend can occur for many other reasons. For example, as the death toll mounts and people begin to worry about the virus and its consequences, they are likely to modify their behaviour in a way that reduces outcomes relative to a linear extrapolation. Longer term, the trend should bend as immunity builds up and the population becomes less susceptible to the disease. These endogenous feedback effects are built into the non-linear dynamics of the epidemiological models, which allow the trend to change as the epidemic progresses. This should in principle improve forecasts beyond the weekly horizon and make it more likely that systematic forecast errors are due to government interventions or other external influences. These epidemiological models range from the large-scale computer models built by the Imperial College and other modelling groups to simple data-based curve-fitting techniques. For example, many epidemiologists fit a logistic curve to the cumulative number of infections ( ) C t : where: / (0) 1 A K C   , K is the final epidemic size and 0 r  the propagation or infection rate (see e.g., Batista, 2020, Equation (2) ). However, in view of the well known issues around estimation with non-stationary data (Sims, 1980) , we  shows that the number of new cases at any time is a bell-shaped function of the accumulated cases: This model provides a simple way of allowing for the non-linear feedback mechanisms, loosely based on the SIR (susceptible, infectious, removed) model (Avery et al., 2020; Dimdore-Miles & Miles, 2020; Kermack et al., 1927) . Initially, with (0) C cases observed when the outbreak is detected, all of them are "infectious" and they will infect other "susceptible" people at the rate r per unit of time (dt) causing (0) (0) dC rC dt  new cases. Thus initially, the disease will spread exponentially, at the reproduction rate ( ) / ( ( ) ) dC t C t dt r    . However, various negative feedbacks then arise, which reduce the reproduction rate. The classic feedback mechanism is provided by herd immunity. If people who have had the disease are less susceptible to catching it again, then they move into the "removed" class. As they increase as a share of the population (N) the probability that an infectious person will meet a susceptible one falls from 1 to (1 . This results in ( )(1 ( ) / ) rC t C t N dt  new cases per unit of time, resulting in (1) with K N  . However, there is a problem with this interpretation. If this were the only mechanism at work, we would expect K to be of a similar size to the population N. But it is much smaller than N empirically, suggesting that C is under-recorded. For example, Dimdore-Miles and Miles (2020) assume that the number of new cases that are symptomatic and recorded is a fraction  of the true number. If C represents the true number and o C the recorded number, then substituting / o C C   into (2) gives the model: Thus the estimator K  effectively replaces K. However, as they conclude the value of  would need to be extremely low to fully explain the low value of this estimate. Precautionary feedbacks can also help to reduce the reproduction rate, as argued in the introduction. For example, as C grows, people are likely to modify their behaviour in a way that mimics the effect of immunity, reducing the reproduction rate (1 ( ) / ) r C t K    via the K parameter. This behaviour can be reinforced by government interventions like lockdown. On a more pessimistic view, if immunity from exposure to the disease is partial or tends to fall with the time since exposure, or indeed if the precautionary response depends upon the recent rather than the cumulative number of cases, then there may not be an upper limit to the cumulative number of cases. The logistic model is designed to explain the transmission of a virus within a closed community. But apart from the country where the virus originates, all the initial cases will involve people that have recently entered the country and the number of new cases n will be related to the number of new arrivals rather than C. Thus in the initial stages, before community transmission begins: (0) dC ndt  and not (0) (0) dC nC dt  as implied by the logistic model. However, the main problem with this model is that the bell and logistic curves are symmetric. The bell curve has a single peak at / 2 C K  . Once this is passed, the number of new cases begins to fall, following a path that mirrors the upward climb, before slowing to a stop as C approaches K. This was in fact the experience of China and many East Asian countries, which is why the logistic curve fits their data well. Unfortunately, the experience in many other countries has been very different. The number of deaths fell back from the peak, but then remained stubbornly high. We need a more flexible model to allow for these possible effects. Mathematically, we can achieve this simply by raising the C and (1 / ) C K  terms in (2) by the powers  and  . This makes it a beta function, which is much more flexible: Alternatively, suppose that the negative feedback effects mean that the reproduction rate  follows an exponential rather than a power law as the number of cases mounts: . Then: Importantly, in this model, there is no upper limit on the total number of cases as there is in the logistic and beta models. This makes it easier to fit to data sets in which the number of cases is falling but remains high, making it difficult to estimate the end-point parameter K. Theoretically, as noted, there could be situations in which the disease becomes endemic and there is no limit to the cumulative number of cases. The performance of this exponential feedback model can be improved by changing the power of the C term to  , thus giving the trend a form similar to that of the gamma density function: This function is used extensively in statistics to describe probability distributions, the well-known 2  distribution being a special case (Mood et al., 1973) . Its mathematical properties are reviewed in Appendix. These processes are non-stationary and should be handled using techniques developed for modelling non-stationary economic data, like growth and inflation. Their dynamics are dictated by stochastic differential equations with drift (i.e., trend) and volatility terms, like those used to model interest rates (Ait-Sahalia, 1996) . We give the volatility term a form that is congruent with the drift. The logistic model outlined in the previous section was originally developed to explain the number of new infections. However, in the absence of mass testing, the true numbers of people who are infected and those that have recovered are likely to be much larger than those recorded, especially if there is a large proportion of asymptomatic cases. To avoid these measurement problems, we extend the reasoning of the logistic infections specification to track deaths instead, following Murray (2020) and many others. Suppose for example that deaths ( ) D t represent a constant lagged fraction of the true number of infections ( ) C t . Substituting this into (4) and suitably reinterpreting the parameters: We discretize (5) and use t d to represent ( ) dD t and the cumulative number of deaths as In the empirical models we use ECDC data for daily mortality rates, which express the daily death count as a constant share of the population. This has the effect of normalizing the data to allow for country size, which is particularly important in cross-country comparisons such as these conducted in Section 4.5. 2 We use a rolling weekly average of the daily mortality rate instead of the daily series. This has the effect of smoothing out the erratic day-to-day movements often seen in the raw data as well as the weekend reporting lag seen in the US and several other countries. Specifically, denote the reported number of daily deaths in day t by o t d . We calculate its moving average Finally, we add a congruent volatility specification: where , ,r   and  are parameters to be estimated and is a Gaussian error term. These variables are measured as a share of the population, per 10 million people. Similarly, the beta model corresponding to (3) is: Setting , 1    simplifies this to the logistic specification: We found that  was close to 0.75 for these models and countries and fixed this parameter at this value in the regressions reported here. 3 The previous section identifies three candidate models (logistic, gamma and beta). We estimate these models using data for the first wave provided by the ECDC. This source provides daily death (and infections) data from 1st January to 14th December 2020, when the ECDC discontinued the daily series due to the effects of retrospective corrections, delays in reporting and similar problems. To select the best model, we consider both the in-sample fit and (apart from China) the post-sample forecasting performance. To rank the models by fit we use the SIC, which adjusts the likelihood value appropriately for the number of parameters and observations to guard against over-fitting. To avoid the bias in estimates caused by an integer data count we start the estimation for each country from the date when the cumulative number of deaths exceeds 1.5 per 10 million people. The end of sample for each country is determined by the end of the first wave. Specifically, we end it at the beginning of the two-weekly period with the minimum death toll, counting from the beginning of the sample (please see the tables for more detail). We then check that the best fitting model tends to provide the best forecasts. Finally, we report the full sample parameter estimates as well the estimates from an eight-week data sample, used to represent an on-going epidemic. Table 1 shows the regression results obtained for the three rival models using the data for China. This was of course the first country to be hit by Covid-19 and managed to suppress it effectively by the end of March, when we end this sample. Figure 2 shows the in-sample fit of the logistic (red line), gamma (green line) and beta (blue line) regression models. As noted in the introduction, the bell-logistic model represents the behaviour of this outbreak nicely, although the beta drift model is better in terms of statistical criteria and with    indicates a small positive skew in the data. We next analyze the daily mortality data for Italy and Spain, which were the first western countries to be overwhelmed by the virus and where the skew in the mortality figures first became apparent. The top panel of Figure 3 shows how the models fit the weekly average of the daily death data over the full sample. Table 2 reports the full sample parameter estimates. The beta model performs best for both countries over the full period. Arguably, a more relevant test is to ask how well these models fit and forecast as the epidemic evolves in real time, helping to inform the tightness of government policy measures. To assess this, we estimate a weekly series of regressions with an expanding time window. In the second row panel of Figure 3 , we plot the SIC value for each weekly regression for the three models. This exercise identifies three distinct phases. Initially, the SIC criterion finds difficult to discriminate between these models. That is likely to be because they only differ in the way that they represent the negative feedback effects, which are not very powerful initially, making it very difficult to predict the final death toll. 4 However, after another a month, the gamma and beta models begin to outperform the logistic. This outperformance becomes more pronounced as the peak is passed and the data becomes more informative about the negative feedback effects. As noted in the introduction, this is because the daily mortality figures during the first wave were asymmetric: the downswing was more gradual than the initial upswing. Consequently, the logistic model, being symmetric, fits poorly during this phase and systematically under-predicts the subsequent number of deaths. The poor forecasting performance of the logistic model is clear from the "hedgehog" forecast charts shown in the third row panel. To construct this type of chart, we use parameter estimates from the expanding weekly sample models to make a succession of four-week ahead forecasts. The dots in this chart, which form the back of the hedgehog, show the cumulative number of deaths observed at the end of each week. The forecasts are shown as the spines of the hedgehog. After a couple of months, we identify a third and final phase, in which the beta models begin to outperform the gamma model. This is because the logistic and beta functions have a well-defined end value (K) for the final death toll, relative to population (which the gamma does not), terminating the wave decisively. To illustrate the way that these models represent an on-going epidemic, Table 4 reports a set of parameter estimates for the eight-week data sample. The gamma and beta models both perform well at this point, significantly outperforming the logistic. Note in particular the very low K values for the final death toll implied by the logistic model. The K-values from the beta model provided a much more realistic projection of the final outcome. The bottom panel of Figure 3 shows how these models fit the four-and eight-week data samples. We next look at the US and the UK, two of the countries with the highest mortality rates during the first wave. The top panels of Figure 4 shows how the models track the full sample, the two central panels plot the SIC values and 4-week forecasts for the weekly regression models and the bottom panels shows the four-and eight-week fit. The parameter estimates are reported in Tables 2 and 4 . Once again, we see an initial phase in which all three models have a similar fit, followed by a steady outperformance of the gamma and beta models in terms of both the fit and the forecasting performance. In the UK, we identify a third and final phase after around two months, in which the beta clearly outperforms the gamma model in these respects and the logistic does as well as the gamma model. The UK resembles the Italian experience in this sense. In-sample fit (top row), Schwartz Information Criterion (SIC) and rolling out-of-sample forecasts of cumulative deaths calculated on an expanding window (second and third central rows, respectively) and dynamic fit (bottom row) for Italy (left) and Spain (right) [Colour figure can be viewed at wileyonlinelibrary.com] However, the US experience was then very different because the weekly death toll never reached the lows seen in these countries over the late summer months. The beta and gamma models perform equally for the US until the mortality figures begin to move back up again in September, marking the beginning of the second wave. It is perhaps worth noting that the curves generated by these two US models overlap and smooth out the lumpy figures announced around the peak of the epidemic. The erratic nature of these data make forecasting very difficult and lead to large errors over the following month. However, the subsequent forecasting performance of the US gamma model is particularly impressive. T A B L E 2 Full sample estimates of the logistic, gamma and beta models We estimate these models for eight other West European countries 5 : France, Germany, Belgium, Denmark, Ireland, the Netherlands, Portugal and Sweden. We also estimate models for Canada and Brazil. Performance is shown graphically in Figures 6-9 . The parameter estimates are reported in Tables 2-5. The patterns observed in Italy, Spain, the US and the UK are also apparent in these figures. The SIC criterion reliably selects the model that provides the best forecasts. With the exception of Brazil and Portugal, which are similar to the US in progressing directly into a second wave, the beta drift model fits much better than the other models over the full period. However, over the first two or three months the gamma drift model invariably fits as well, if not better, and provides more robust forecasts than the beta. The gamma model fits and forecasts remarkably well for Brazil. In this case, the very low T A B L E 3 Full sample estimates of the logistic, gamma and beta models eight-week estimate of  stands out, indicating that the negative feedback effects are still very weak at this point, with the epidemic still in the upswing phase. The logistic model systematically under-forecasts the spread of the virus in all these countries, unless and until the epidemic ends. These patterns stand out despite the very different experiences of these countries during the first wave. Germany successfully combined a lockdown with mass population testing and had a much lower mortality rate than France (Figure 5 ) and other countries. Denmark was the second European country after Italy to go into lockdown, on 11 March, before any fatalities had occurred. Its peak mortality rate was similar to that seen in Germany and much lower than in its neighbour Sweden, which was exceptional in having relied upon individual responsibility rather than lockdown to contain the spread of the virus. T A B L E 5 Estimates of the logistic, gamma and beta models at the 8-week stage Time series models are designed to abstract from the noise in the data and provide estimates of the trend in the series. In the case of an on-going, non-linear process like an epidemic, they can also be used to indicate the rate at which the trend is increasing or decreasing and whether this rate is accelerating or decelerating. In this case, the trend in the cumulative death toll is the estimated number of daily deaths, described by the a drift function like Equation (7). These properties can be seen from the shape of the curves in Figures 3-9 . However, rather than "eyeballing" charts it is often better to look at the results numerically, using well-known statistics, particularly when comparing different countries. Table 6 shows some of the basic numbers that describe the spread of the epidemic in the US, Canada and West European countries at the eight week stage. We use the gamma drift model to represent this. Its parameters arguably allow a broader assessment of the characteristics of the virus than comparisons of death tolls in different countries. These statistics follow from the well-known mathematical properties of gamma-type functions, which are reviewed in the Appendix. The first three columns show the parameters estimated for each country, reproduced from Tables 4 and 5. These are first used to determine when the peak in the death toll is likely to have occurred. This can be difficult to gauge from the data visually, especially in countries like the US where the data is lumpy around the peak due to measurement problems. The table shows the date that the peak was reached in each country; the estimated number of daily deaths at the peak (corresponding to the height of the peak in each figure) and the cumulative number of deaths at that point. We then calculate the "skewness" coefficient for each country, shown in column (vii). This indicates how different the decline from the peak was compared to the rise from the first few cases to the peak. The rows of this table are ranked in terms of skewness, starting with Belgium at the top. One of the stand-out features of this table is the strong positive correlation between the skew and the daily ( 2 0.94202 R  ) and especially the cumulative ( 2 0.99999 R  ) mortality rates at the peak. Countries like Germany and Denmark that are judged to have dealt with the epidemic effectively compare very favourable in all of these respects with those like the UK and Belgium that were not. More interesting is the observation that all three variables are negatively correlated with the parameter  , which of course acts as an indicator of the strength of the negative feedback effects. This suggests prima facie that the negative feedback effects are largely due to the effectiveness of government policy and precautionary behaviour by the public rather than herd immunity. Indeed, if we take the inverse of the gamma parameter (effectively expressing the exponent in (5) as / D K, as in the other two functions, rather than D   ), its correlation with the skew ( 2 0.91409 R  ), the daily ( 2 0.99413 R  ) and cumulative ( 2 0.99371 R  ) mortality rates is also very high. Importantly, this does not necessarily follow from the gamma function, since, for example, Equation A1 shows that the skew depends upon both the  and the  parameters. It only follows in Table 6 because the variation in  is small compared to the variation in  . Although this research project was initially aimed at modelling the features of the stubborn upper tail seen in Italy and elsewhere, these results also help us to a better understanding of the early, exponential, phase of this pandemic. In this early, pre-peak phase, attention is focussed on the time that it takes for the cumulative number of infections to double. One of the striking features of Tables 2 and 3 is that the estimates of the parameter (1) r   from the bell-logistic model are remarkably close, ranging from 0.07 for Sweden to 0.11 for Belgium. This parameter is important because it shows the daily growth rate during the initial phase of the epidemic, before the various negative feedback effects are significant. Thus the logistic model would suggest that country-specific factors are not significant in explaining the initial spread of the virus. However, the gamma drift model, which allows more flexibility in tracking the initial phase through the parameter  , suggests that the logistic model covers up important idiosyncratic effects. Table 6 shows the values of (0.5)  and (1.5)  and the respective doubling times during the initial phase. With 0.5 D  the doubling time ranges widely, from less than a day in Denmark to 3.14 for France, in strong contrast to the impression given by the logistic model. This paper shows how the econometrician's toolkit can be used to develop a simple reduced form model of the time series generated by an epidemic. We illustrate this using daily mortality data generated by the first wave of Covid-19 for 14 American and European countries. We use standard model selection techniques to find the model that best fits in-sample at any stage of the epidemic and show that this reliably generates the most accurate post-sample forecasts. With the exception of China, the logistic model frequently employed by epidemiologists to model time series data is decisively rejected against the more flexible gamma and beta models. These handle the very different wave characteristics seen in these countries remarkably well. In-sample fit (top row), Schwartz Information Criterion (SIC) and rolling out-of-sample forecasts of cumulative deaths calculated on an expanding window (second and third central rows, respectively) and dynamic fit (bottom row) for Canada (left) and Brazil (right) [Colour figure can be viewed at wileyonlinelibrary.com] These time series models provide useful statistics that summarize the reproduction, morbidity and mortality rates in different countries. We could use these to look at the effects on these indicators of variations in containment and testing strategies across a cross-section of countries, while controlling for different demographic and other characteristics. One of the interesting findings that emerges from the present study is that there is a strong correlation between the parameters like  and K that represent the strength of the negative feedback and the skew and peak mortality rates. The epidemic was less severe in countries like Germany, Denmark (and indeed China) that were generally regarded as being effective in dealing with the virus than they were in others like Belgium and the UK that were not and it seems likely that these feedback parameters reflect the efficacy of government policy. However, such reduced form models have their limitations. Their dynamics are the result of a convolution of the long and possibly variable lag distributions involved in the data generation process and it is impossible to unravel these without embedding them in a large scale structural model. In the present context, despite the prima facie argument that the negative feedback results from policy and precautionary behaviour rather than from the build up of immunity, we cannot be sure. Nor can we say whether the skew seen in the mortality data in many western countries during the first wave was due to the skew in the lag from infection to reinfection, the lag from infection to death or other effects. The difference between the experiences of these countries and East Asian countries remains to be explained. Nevertheless, looking forward, we can potentially use reduced form models to identify structural breaks and perform similar tasks. Econometricians have a variety of handy tools for conducting this kind of work, including tests for discrete changes when the break-point is unknown a priori (due to the advent of a new variant e.g.,) as well as tests for breaks at points when a change is likely to have occurred (due to a policy intervention e.g.,). The explosive behaviour test proposed in a series of papers by Phillips, Shi and Yu (Phillips et al., 2014 (Phillips et al., , 2015 also looks potentially useful in this respect. This was developed for testing bubbles in financial time series, which are analogous to the early phase of a new virus or variant. We are grateful to Karl Claxton, Nigel Rice, Luigi Siciliani for insightful comments. We are also indebted to a referee for helpful suggestions. Authors have no conflict of interest to declare. The data that support the findings of this study were derived from the following resources available in the public domain: https://urldefense.com/v3/__https://covidtracking.com/data__;!!N11eV2iwtfs!96BW42qynzC4oTUck4Z36Y-2LaXpboG6ssTx_P7smvW7fpPU19FQ9QevI0Q6_jxU1$, https://urldefense.com/v3/__https://www.ecdc.europa.eu/en/ covid-19__;!!N11eV2iwtfs!96BW42qynzC4oTUck4Z36Y2LaXpboG6ssTx_P7smvW7fpPU19FQ9QevI0Ugh6TEt$. Adam Goliński https://orcid.org/0000-0001-8603-1171 Testing continuous time models of the spot interest rate Policy implications of models of the spread of coronavirus: Perspectives and opportunities for economists Estimation of the final size of the coronavirus epidemic by the logistic model Nowcasting of Covid-19 deaths in English hospitals. mimeo Nowcasting The Bank of England's forecasting platform: COMPASS, MAPS, EASE and the suite of models Short-term forecasts of COVID-19 -preliminary version. mimeo, Department of Economics Assessing the spread of the novel coronavirus in the absence of mass Testing Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. mimeo Introduction to mathematical statistics Prediction and analysis of coronavirus disease 2019. mimeo Healthcare cost regressions: Going beyond the mean to estimate the full distribution A contribution to the mathematical theory of epidemics Learning as we go: An examination of the statistical accuracy of COVID19 daily death count predictions. mimeo Testing a DSGE model of the EU using indirect inference Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months IHME COVID-19 health service utilization forecasting team. IHME COVID-19 health service utilization forecasting team Specification sensitivity in right-tailed unit root testing for explosive behaviour Testing for multiple bubbles: Historical episodes of exuberance and collapse in the S&P 500 Challenges in estimating the distribution of delay from COVID-19 death to report of death Review of Ferguson et al" Impact of non-pharmaceutical interventions Macroeconomics and reality The mathematical properties of gamma-type functions are well known. This function is used to define the gamma probability distribution Mood et al. (1973) . Health economists are also used to dealing with skewness and kurtosis. For example Jones et al. (2015) use generalized gamma and beta functions to handle the tails in health cost distributions. The value of the drift in (6) at any time gives the expected number of deaths and is obtained by substituting the cumulative number of deaths at that time. However, analytically, it is often easier to work in continuous rather than discrete time, using (5) instead of (6). Gamma-type functions have a single peak. This is found by taking the first derivative of the drift:and equating this to zero by setting /. These values are shown in column of the first panel of Table 6 and substituting them back into (5) gives the estimate of the daily death toll at the peak. The peak corresponds to the mode in the gamma distribution. Similarly, the skewness coefficient shown in column (vii) follows from that of the gamma distribution (Mood et al., 1973) : Although this paper is focussed on the features of the stubborn upper tail, our results also help us to a better understanding of the early, exponential, phase of this pandemic. In that phase, attention is focussed on the time that it takes for the cumulative number of infections to double. This can also be used to assess the initial behaviour of the mortality rate. But to gauge that we need to have an estimate of the reproduction rate ( ( )) D t  . Table 6 shows three estimates implied by the gamma drift function (7), expressed as the expected daily change as a decimal fraction of the cumulative. The first is evaluated at the start of the sample with 1.5 D  , suggesting that is in the range of 0.2 0.6  for most countries. The second is a pre-sample extrapolation value for 0.5 D  . The third is the value at the peak. We can then calculate the doubling time by dividing (5) by D to get the percentage or logarithmic change: