key: cord-0518358-7d1unn0z authors: Juul, Jonas L.; Graesboll, Kaare; Christiansen, Lasse Engbo; Lehmann, Sune title: Fixed-time descriptive statistics underestimate extremes of epidemic curve ensembles date: 2020-07-09 journal: nan DOI: nan sha: 8c8493ec94ac377ca64f01501b1e5ba6322460b5 doc_id: 518358 cord_uid: 7d1unn0z Across the world, scholars are racing to predict the spread of the novel coronavirus, COVID-19. Such predictions are often pursued by numerically simulating epidemics with a large number of plausible combinations of relevant parameters. It is essential that any forecast of the epidemic trajectory derived from the resulting ensemble of simulated curves is presented with confidence intervals that communicate the uncertainty associated with the forecast. Here we argue that the state-of-the-art approach for summarizing ensemble statistics does not capture crucial epidemiological information. In particular, the current approach systematically suppresses information about the projected trajectory peaks. The fundamental problem is that each time step is treated separately in the statistical analysis. We suggest using curve-based descriptive statistics to summarize trajectory ensembles. The results presented allow researchers to report more representative confidence intervals, resulting in more realistic projections of epidemic trajectories and -- in turn -- enable better decision making in the face of the current and future pandemics. Accurately communicating uncertainty is essential when forecasting. We are currently witnessing an unprecedented effort of scholars across countries and fields, competing to most accurately predict the trajectory of the novel coronavirus, COVID-19, using a plethora of approaches. These epidemic forecasts are used by governments aiming to soften the enormous consequences that the pandemic has on economy and health worldwide. Particularly crucial to decision makers is information about the overall severity of the epidemic and, in particular, whether local hospitals will be overwhelmed. Political decisions to reopen countries and borders are calculated risks. For that reason, it is of utmost importance that uncertainties and confidence intervals associated with forecasted epidemic trajectories are reliable and well-communicated. To forecast the trajectory of the novel virus, many researchers simulate the spread of the epidemic using mathematical models. Different classes of models are used for this purpose, including deterministic or stochastic compartmental epidemiological models [1, 2] and individualbased models [3] . Given initial conditions of the epidemic prevalence in the population and a number of epidemiological and non-epidemiological parameters, these models can simulate hypothetical spreading scenarios. In reality, the initial conditions and parameters are not known exactly -especially not for a new virus like COVID-19. For this reason, forecasts are often made by simulating a large number of plausible combinations of these inputs, and then summarizing the resulting ensemble of epidemic curves using descriptive statistics. Regardless of the kind of model which produced the ensemble of epidemic trajectories, the ensemble is usually summarized using fixed-time descriptive statistics, see for * jlju@dtu.dk † sljo@dtu.dk example Refs. [3] [4] [5] [6] [7] . For each time step, the instantaneous value of curves are ranked from smallest to largest and (possibly weighted) percentiles are computed. These percentiles are then used to produce confidence intervals for the forecast on the given time step -for example, the 50% least extreme values could be shown by marking the interval making out the 25 th to 75 th percentiles. Below, we show that in the context of forecasting trajectory extremes, however, such fixed-time approaches to producing confidence intervals suffer from a serious deficiency. This type of fixed-time statistics is biased against showing the projected peaks of the curves, and thus could obscure the part of the forecast most essential to decision makers. To illustrate the pitfalls of using fixed-time descriptive statistics to summarize ensembles of simulated epidemic trajectories, let us recount the fictional tale of the inhabitants of Transmithaca. On the island of Transmithaca, one million people lived in complete isolation from the rest of the world. A virus had ravaged the outside world, and in the process all viral parameters had become known with perfect precision. As Transmithaca slowly opened up for outside visitors, the inhabitants knew everything about the virus -except when it would arrive. The leaders of Transmithaca asked their epidemiologists to estimate how the disease would impact society. The epidemiologists simulated a number of scenarios, all with perfect choices of parameters, but different starting dates for the epidemic. Their simulations produced an ensemble of epidemic curves and, thinking that the individual simulated epidemic trajectories might clutter the picture, they presented the fixed-time summary statistics shown in grey and black in Fig. 1A that might infect between 2 000 and 3 000 individuals at peak impact. As we can inspect, however, from the ensemble of time-displaced curves, the actual peak impact in every single case is more than 4 000 cases. The Tragic Tale of Transmithaca is of course a caricature, but it illustrates a fundamental problem in the way ensembles of simulated epidemic trajectories are currently summarized. The future course of each curve is decided by the parameters and, crucially, the entire past of the curve. These long-term correlations in the shapes of trajectories imply that basing summary statistics on single points in time separately can be misleading. Because a single curve can enter and leave the marked percentiles, the fixed-time descriptive statistics cannot be expected to capture what a real trajectory might look like. In the case of Transmithaca, the summary underestimates the infected-count at peak impact, even though this quantity is identical across all curves. This is particularly unfortunate given that -if you are in charge of pandemic response -a reliable understanding of peak impact is essential. Fig. 1B illustrates that using fixed-time statistics to summarize ensembles can be very inaccurate if curves intersect. The figure also shows that fixed-time www.github.com/jonassjuul/curvestat. A Python package, curvestat, to produce the curve-based descriptive statistics used in this manuscript can be cloned from www.github.com/jonassjuul/curvestat statistics are accurate in some special cases, for example when every point on each curve constitutes the same fixed-time percentile for every time step. This naturally leads us to the question: Do curves cross in real ensembles of simulated epidemic trajectories? We believe that the answer to this question is: Most likely, yes. We have been involved in the task of forecasting the epidemic in Denmark, and our sampling of parameters produced an ensemble of trajectories with many such intersections [9] . In Fig. 2A we show an ensemble of epidemic trajectories produced as part of the Danish COVID-19 forecasting effort. This particular ensemble shows projected daily hospitalizations in the period May 5 -October 1 and was produced under a worstcase assumption that Danes would stop practising social distancing. 2 For this ensemble, 67% of curve peaks lie above the 75 th fixed-time percentile on the given day. For two other curve ensembles -which were produced with best-case and moderate assumptions on social distancing, respectively -38% and 54% of peaks lie above the 75 th percentile. It is clear that percentiles calculated using fixed time statistics (shown using solid black lines) systematically underestimate peak values in these examples. As an alternative to using fixed-time statistics in order to recapitulate the desired features of curve ensembles accurately, we propose 2 alternative summary statistics: 1) Curve-based descriptive statistics; and 2) Summarizing estimated likelihoods of specific scenarios of interest. Curve-based descriptive statistics. Whereas fixed-time descriptive statistics separately evaluates the centrality of instantaneous curve values, curve-based descriptive statistics rank and visualize centrality of entire curves. We suggest using curve box plots to visualize trajectory confidence intervals. Curve box plots are sometimes used for summarizing functional ensembles in simulation sciences [10, 11] and the procedure for constructing a curve box plot is straightforward, 1. Rank curves from more central to less central, 2. Plot the envelope containing the most central curves. Here, we define the envelope, E(S), of a set of curves, S, as the area spanned by the curves in S. More precisely, a point (t , y) is contained by the envelope of S if there exist curves c i (t), c j (t) ∈ S such that c i (t ) ≤ y ≤ c j (t ). There are different ways one can rank the centrality of curves, each having its merits. With an all-or-nothing ranking method all curves start with a centrality score of FIG. 2. Curve-based descriptive statistics to summarize curve ensembles. A Individual curves from an ensemble of 500 simulated epidemic trajectories produced as part of the Danish COVID-19 forecasting effort. B-E Curve box plot of the most central 50% (blue) curves plotted alongside the 25 th -75 th percentiles computed as fixed-time descriptive statistics (black lines). Dots colored using a gaussian kernel density estimator [8] shows position and density of peaks of the 50% most central trajectories. Insets show the curve box plot of the most central 90% (light blue) curves plotted alongside the 5 th -95 th percentiles computed as fixed-time descriptive statistics (grey lines) and a scatter plot showing all trajectory peaks in the ensemble. In B-E, the centrality of curves were ranked in different ways: B All-or-nothing ranking for the full predicted time interval, Ncurves = 50 and N samples = 100; C All-or-nothing ranking using only the part of curves between day 50-100, Ncurves = 10 and N samples = 100; D Weighted ranking with a reward f (t) = e −(t−t 0 ) ln(2)/7 for all curves, t0 being the current date. In this case, the reward for being contained in a sampled envelope gets half as big for every 7 days. E Curves ranked according to their predicted peak number of newly hospitalized patients. The median prediction was deemed most central. F Summarizing the likelihood of certain scenarios as predicted by the curve ensemble. The heatmap color on the point (x, y) indicates the fraction of ensemble curves that at some point predict at least x consecutive days with at least y newly hospitalized patients on each day. 0. N curves curves are drawn uniformly at random from the ensemble and their envelope, E sample , is constructed. We then check which ensemble curves are entirely contained by E sample : curve c i gets s(c i ) added to its centrality score only if all points constituting the curve are contained in E sample . The curve-dependent score s(c i ) allows prior information to inform the ranking, e.g. a curve's fit to existing data. Uniformly random samples like this are drawn N samples times in total, and each time, centrality scores of all curves are updated. In the end, curves with more centrality points are more central in the ensemble. In Fig. 2B and C we show curve box plots created using all-or-nothing rankings. We also show the fixed-time descriptive statistics for the ensemble. An alternative to the all-or-nothing ranking is what we will call a weighted ranking: rewarding curves for each time step they are contained in E sample . Again, one draws N curves uniformly randomly from the ensemble. Now, however, we add s(c i )f (t) to curve c i 's centrality score if c i is contained by E sample at time t. The time dependency of the reward can reflect, for example, that some forecasts are expected to be very accurate in the near future but decrease in accuracy with time. Fig. 2D shows curve box plots obtained with a time-dependent weighted ranking like this. In addition to the ranking methods mentioned above, one can rank the curves according to some feature of interest. In Fig. 2E we show the curve box plots obtained when we rank curves according to their projected maximum values of newly hospitalized cases in a single day; in other words, the median projected peak value received the highest centrality. Likelihoods of specific scenarios of interest. The curve box plots introduced above each visualizes an area that contains a fraction of the ensemble curves. Sometimes, however, we may want to go beyond rough estimates of the temporal course of trajectories. It might be more interesting to quantify the risk of certain scenarios happening. For example, consistent large numbers of hospitalized patients for long periods places a serious burden on healthcare systems [12, 13] . The risk of such scenarios can be explicitly evaluated by counting how often they occur in the ensemble of curves. For example, if half of the simulated curves predict that hospitals will get at least 300 new patients every day for at least 20 consecutive days, and all curves are considered equally likely, the probability of this scenario is estimated to be 50%. Fig. 2F shows a heatmap communicating this type of risks. The color of the point (x, y) in this figure, indicates the fraction of ensemble curves that -at some point -have at least x consecutive days with at least y newly hospitalized patients on each day. On top of the heatmap, we plot the corresponding contour plot. From Fig. 2F we directly read off that given this model, the risk of receiving at least 200 new patients for at least 1 days in a row is less than 50% and that there is less than 1% risk of receiving at least 400 new hospitalizations each day for at least 40 consecutive days. If curves are not considered equally probable, but instead each curve, c i , carries a weight, w(c i ), we can generalize the above analysis. In that case, the heatmap value at (x, y) would instead be i∈I w(i)/ j∈S w(j), where I is the set of curves that predict at least x consecutive days with at least y new patients hospitalized on each day. In summary, when making forecasts of epidemic trajectories, it is important to represent the resulting curve ensembles in a way that captures the quantities of interest in an intuitive way. Here we have argued that computing confidence intervals using fixed-time descriptive systematically suppresses trajectory extremes. This is natural. Fixed-time descriptive statistics are designed to show the least extreme predictions on a given date, not to take entire curves into account. In a situation where the projected peak numbers of hospitalized patients are of the utmost importance to decision makers and the public, however, this is unfortunate. We hope that this paper raises awareness of this pitfall of fixed-time descriptive statistics for summarizing ensembles of epidemiological trajectories. In addition to identifying this shortcoming, we have suggested how curve ensemble extremes can be summarized and visualized instead. The methods we have suggested here -plotting curve box plots and estimating likelihoods of specific scenarios of interest -focus on ensemble trajectories instead of single time steps. This approach provides forecasters with statistical tools that do not filter out curve extremes. At the same time, however, it is clear that there is still im-portant research to be done. We have suggested different ways of ranking centrality of ensemble curves. Each ranking has its merits: For example, the all-or-nothing ranking penalizes curves which are shaped differently than other ensemble curves and a weighted ranking can reward curves that are central in the very near future. A thorough investigation of these merits and trade-offs is needed. Until this has been investigated, we encourage researchers to be creative and mindful about the problems that each statistical method could have. And to communicate this uncertainty openly to decision makers. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation Los Alamos National Laboratory COVID-19 team Covid scenario pipeline Multivariate density estimation: theory, practice, and visualization Tillaegsrapport af den 20. maj 2020 The Lancet Respiratory Medicine The authors are thankful to the members of the SSI COVID-19 modeling group for an excellent collaboration and to Carl T. Bergstrom for comments on an early version of the manuscript. J.L.J and S.L. received additional funding through the HOPE project (Carlsberg Foundation). J.L.J. and S.L. conceived the idea.J.L.J. performed simulations, analysis and calculations. K.G. and L.E.C. devised and performed epidemiological simulations. All authors contributed to discussions and wrote the manuscript.