key: cord-0473139-n7lmr2qj authors: Antulov-Fantulin, Nino; Bottcher, Lucas title: On the accuracy of short-term COVID-19 fatality forecasts date: 2021-07-21 journal: nan DOI: nan sha: 78a6699e8554af4b70ac5681cbd90b8c0a3c07bd doc_id: 473139 cord_uid: n7lmr2qj Forecasting new cases, hospitalizations, and disease-induced deaths is an important part of infectious disease surveillance and helps guide health officials in implementing effective countermeasures. For disease surveillance in the U.S., the Centers for Disease Control and Prevention (CDC) combine more than 65 individual forecasts of these numbers in an ensemble forecast at national and state levels. We collected data on CDC ensemble forecasts of COVID-19 fatalities in the United States, and compare them with easily interpretable ``Euler'' forecasts serving as a model-free benchmark that is only based on the local rate of change of the incidence curve. The term ``Euler method'' is motivated by the eponymous numerical integration scheme that calculates the value of a function at a future time step based on the current rate of change. Our results show that CDC ensemble forecasts are not more accurate than ``Euler'' forecasts on short-term forecasting horizons of one week. However, CDC ensemble forecasts show a better performance on longer forecasting horizons. Using the current rate of change in incidences as estimates of future incidence changes is useful for epidemic forecasting on short time horizons. An advantage of the proposed method over other forecasting approaches is that it can be implemented with a very limited amount of work and without relying on additional data (e.g., human mobility and contact patterns) and high-performance computing systems. Over the course of the COVID-19 pandemic more than 65 international research groups contributed to an ensemble forecast of reported COVID-19 cases, hospitalizations, and fatalities in the U.S. [1] . These forecasts are a central source of information on the further development of the pandemic and used by various governmental and nongovernmental entities including the Centers for Disease Control and Prevention (CDC) [2] . Different forecasting methods rely on different underlying models and assumptions. One may roughly divide forecasting models into three different classes: (i) mechanistic models [3, 4] , (ii) purely data-driven models [5] , and (iii) hybrid models. Most classical epidemic models are mechanistic and aim at describing disease dynamics in terms of interacting individuals in a population. Such models are usually applied to describe the influence of certain factors (e.g., population density, demographics, contact patterns, mobility, etc.) on the dynamics of an epidemic. Data-driven or machine learning models make fewer assumptions about the underlying dynamics and are applicable to a broader range of forecasting problems, but they also come at the cost of less interpretability for policymakers and epidemiologists. * anino@ethz.ch † lucasb@g.ucla.edu Here, we show that a very basic, model-free forecasting approach provides effective short-term forecasts of COVID-19 fatalities. We refer to this method as "Euler forecast", owing to its mathematical connection to the Euler method [6, 7] that is used in computational mathematics to calculate the value of a function at a future time step based on the current rate of change. We collected data on CDC ensemble forecasts between June 2020 and June 2021 [1] . Ensemble forecasts are available for cumulative and weekly incidence numbers and a forecasting horizon between one to four weeks. All forecasts use data from the Johns Hopkins Coronavirus Resource Center [8] as reference. Forecasts are made for epidemiological weeks which run Sunday through Saturday. As an example, if forecasts with one and four-week forecasting horizons are being made on June 7, 2020 the corresponding target dates are June 13, 2020 and July 4, 2020 [9] . We compare CDC ensemble forecasts of COVID-19 fatalities with a simple and easily interpretable forecasting method. To do so, let y(t) be the incidence of COVID-19 fatalities at time t. We useẏ(t) to denote the rate of change of y(t) at time t. Forecasting the incidence y(t + ∆t) at a target time t + ∆t requires us to find an estimate of this quantity at an earlier time t. A straightforward way to construct short-term forecasts is to use the current rate of changeẏ(t) and determine a forecast at time t k = t 0 + k∆t according to the Euler method [6, 7] where ∆t and k = 1, 2, . . . represent a time step (e.g., one week) and the number of time steps in the forecasting horizon, respectively. However, observed incidences are subject to observation noise that results from confounding factors including sampling bias, measurement errors, and reporting delays [10] . A possible way to "de-noise" observed data is to use previous weekly incidences instead of daily incidence levels. If observational noise can be reduced by averaging over a period of several days, daily errors are less pronounced on a weekly level. However, the local daily derivative is quite sensitive to noise and our incidence correction term is not helping in making accurate short-term forecasts. Therefore, we can impose some degree of regularity to reduce the level of noise with the following minimization arg min where y k = y(t 0 + k∆t), w k = w(t 0 + k∆t) is a regularized approximation of y k , and λ is a regularization parameter. In the limit λ → 0, the argument of Eq. (2) is minimized if w(t) approaches y(t). In the limit λ → ∞, the argument of Eq. (2) is minimized if w(t) is constant (i.e., if w k − w k−1 = 0). This optimization process has its equivalent Euler-Lagrange formulation for differentiation [11, 12] . Values of λ ∈ (0, ∞) yield functions w(t) that are smoothened versions of y(t) with respect to the discrete rate of change w k − w k−1 . Finally, the regularized Euler short-term forecast 1 is given by In the following section, we utilize the regularized Euler method to generate forecasts of reported COVID-19 fatalities. Our source codes are publicly available at [13] . Figure 1 shows CDC ensemble forecasts (solid blue lines) of the weekly incidences of reported COVID-19 fatali- Fig. 1 also shows Euler-method forecasts (solid red lines) of weekly incidences of COVID-19 fatalities in the U.S. We observe that one-week CDC ensemble forecast for the majority of data points are not more accurate than one-week Euler forecasts [ Fig. 1(a) ], which we use as a local-derivative-based forecasting benchmark. Although Euler and CDC forecasts still exhibit a similar structure for a four-week forecasting horizon [ Fig. 1(b) ], the Euler method is associated with larger deviations from the reported fatalities than the CDC ensemble method. To quantify differences in forecasting errors between the two methods, we use to denote the absolute error between the Euler forecast x(t) and CDC forecast y(t) for target time t. Figure 1(c,d) show the 4-week moving averages of weekly forecasting errors δ(t) (solid lines) of the Euler (red) and CDC ensemble (blue) methods. As suggested by our above discussion of Fig. 1(a,b) , we observe that the error of the Euler method is substantially smaller than that of the ensemble forecast for a one-week forecasting horizon. In about 61% of the forecasting instances shown in Fig. 1(a) , the regularized Euler method has a smaller error than the CDC ensemble forecast. The cumulative forecasting errors are 49,925 (Euler) and 52,885 (CDC). Without correction term [i.e., for k = 0 in Eq. (3)], the cumulative forecasting error of the Euler method is 52,660, again smaller than that of CDC ensemble forecast. Note that no regularization corresponds to a simple shift of the incidence curve. For a 4-week forecasting horizon [ Fig. 1(d) ], the cumulative error of the CDC ensemble forecast is 87,717, about 35% smaller than that of the Euler method. Our results suggest that easily interpretable methods like the Euler method, a model-free local-derivative-based forecasting benchmark, provide an effective alternative to more complex epidemic forecasting frameworks on short-term forecasting horizons. Similar conclusions were drawn in a recent study [14] that compared Euler-like forecasts with those generated by Google Flu Trends. Regularized Euler forecasts have smaller errors with respect to CDC ensemble forecasts on one-week forecasting horizons in about 61% of all cases. Simple curve shifts without regularization provide better one-week forecasts in 63% of all cases, yet with a mean absolute error that is about 5% larger than that found for regularized Euler forecasts. For longer forecasting horizons, it is not surprising that CDC forecasts that rely on additional input data, and epidemiological and statistical models become more accurate than Euler-like forecasting benchmarks. One clear advantage of Euler forecasting methods is that they are less labor and resource intensive than more complex forecasting models, which often rely on the knowledge of expert groups and require specialized computing infrastructure. In their simplest implementation, Euler forecasts use the currently observed incidence rate as an estimate of the incidence rate in the following week. The regularization methods (3) can help further improve such data-driven forecasts. In agreement with [14] , our results emphasize the importance of benchmarking complex forecasting models against simple forecasting baselines to further improve forecasting accuracy. Our study also points towards recent findings on algorithm rejection and aversion [15] that found that "people have diminishing sensitivity to forecasting error " and that "people are less likely to use the best possible algorithm in decision domains that are more unpredictable". Finally, in highly uncertain and noisy forecasting regimes, simple methods tend to outperform more complex methods because of a more favorable biasvariance tradeoff [16] . The COVID-19 Forecast Hub Modeling infectious diseases in humans and animals Unifying continuous, discrete, and hybrid susceptible-infectedrecovered processes on networks Applied time series analysis: A practical guide to modeling and forecasting Institutiones calculi integralis An interactive web-based dashboard to track COVID-19 in real time Data submission instructions Using excess deaths and testing statistics to determine COVID-19 mortalities Numerical differentiation and regularization Numerical differentiation of noisy, nonsmooth data Transparent modeling of influenza incidence: Big data or a single data point from psychological theory? People reject algorithms in uncertain decision domains because they have diminishing sensitivity to forecasting error The elements of statistical learning