key: cord-0295103-2j9ugbbq authors: Kriston, L. title: A statistical definition of epidemic waves date: 2022-05-07 journal: nan DOI: 10.1101/2022.05.04.22274677 sha: 189a9b1f707c2cd92b86472ea18b3fd11af656d7 doc_id: 295103 cord_uid: 2j9ugbbq The timely identification of expected surges of cases during infectious disease epidemics is essential for allocating resources and preparing interventions. This study describes a simple way to evaluate whether an epidemic wave is likely to be present based on daily new case count data. The proposed measure compares two models that assume exponential or linear dynamics, respectively. Technically, the output of two regression analyses is used to approximate a Bayes factor, which quantifies the support for the exponential over the linear model and can be used for epidemic wave detection. The trajectory of the coronavirus epidemic in three countries is analyzed and discussed for illustration. The proposed measure detects epidemic waves at an early stage, which are otherwise visible only by inspecting the development of case count data retrospectively. In addition to informing public health decision making, the outlined approach may serve as a starting point for scientific discussions on epidemic waves. The course of infectious disease epidemics is frequently described by referring to 'waves', even though a consensual definition of what constitutes an epidemic wave is currently missing. [1] [2] [3] Some consider the term a useful metaphor referring to a sustained upsurge (frequently called 'spike') in the number of sick individuals (cases). 4 From an even broader perspective, a complete wave includes a rise in the number of cases, a defined peak, and a decline. In the present work, I focus only on the first, rising, phase of epidemic waves. Recently, it has been suggested to use the mean of the effective reproduction number R (which refers to the average number of individuals infected by a single infectious individual during a running epidemic) over the past 14 days to operationalize epidemic waves. 3 This working definition is certainly useful to put discussions on epidemic waves on a more objective footing. However, as the authors acknowledge, it describes rather a 'sustained upward period' than an upsurge in the number of cases. In addition, by calculating the average of equally weighted data points in a defined period, it discards the temporal information that is present in the data. Technically, the unrestricted spread of infectious diseases is commonly characterized by an exponential growth of the number of confirmed cases, while reduced virus transmission and reproduction decelerates growth to a subexponential rate. 5, 6 The aim of this study was to provide a statistical measure of epidemic waves by determining whether the dynamics of an epidemic within a certain time horizon is more likely to be exponential than linear. The proposed measure is based on the time series of the observed daily new cases, as using cumulative data can lead to biased conclusions. 7 While an exponential growth of the total case counts implies an exponential growth of the daily new case counts, it is assumed here that a typical subexponential growth of the total case counts can be well approximated by a linear growth of the daily new case counts. Although the term 'growth' is used here, it should be noted that the suggested indicator does not differentiate between increasing and decreasing new case counts per se. Thus, it can detect not only exponential surges but also exponential decline. However, as the present study focused on epidemic surges, i.e., the increasing phase of epidemic waves, the proposed measure was calculated only if the exponent of the exponential function exceeded one (i.e., if the number of daily cases were increasing rather than declining). The proposed epidemic wave indicator is a Bayes factor that quantifies the strength of evidence that the dynamics of an epidemic is exponential rather than linear. It is calculated using the Bayesian information criterion approximation method from the coefficient of determination of two linear models. 8 The . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. where 2 and 2 refer to the coefficient of determination in the exponential and linear models, respectively. Merging these two equations leads to the formula expressing the strength of support for the exponential over the linear model. A value of one indicates that exponential and linear dynamics have the same probability. Values above one support exponential dynamics, while values below one support linear dynamics. If necessary, thresholds for interpretation are available, classifying a Bayes factor between 1 and 3 as weak, between 3 and 20 as positive, between 20 and 150 as strong, and above 150 as very strong evidence. 8 These thresholds correspond to a 75, a 95, and a 99 percent probability that the exponential model is true, if we assume that they were equally probable before seeing the data. 9 In the present study, a 95 percent bootstrap interval was created for the Bayes factor estimates with 500 samples in order to gain an impression on uncertainty related to the data. All analyses were performed in R. 10 The annotated code can be found in the Supplement. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2022. For illustration, the proposed Bayes-factor-based epidemic wave indicator was calculated for the coronavirus epidemic in the United States, in the United Kingdom, and in Germany with a time horizon of one, two, and three months, using data from the World Health Organization from initiation until Identifying epidemic waves form the time series of daily new case counts is challenging, even retrospectively. This is particularly true if the apparent waves follow each other swiftly and/or build upon each other. Instead of five to six waves as described above, data from all three countries are consistent with the interpretation of three 'big' waves, the first ending in the spring of 2020, the second running through autumn and winter of 2020/21, and the third centering on the winter of 2021/22. These three 'big' waves are all identified very clearly and early by the proposed indicator. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2022. The proposed approach makes clear that judgments on epidemic waves depend on the timeframe of reference and that apparently visible patterns in case count data may provide a subjective and/or incomplete picture. The measure outlined in this study is scalable to any geographic region and takes possible irregularities of the data into account. A central limitation of the presented approach that it relies on the number of reported cases, which can be subject to inconsistencies due to variation in reporting and testing strategies. Thus, the identified waves do not necessarily reflect changes in the true number of infections. However, it is unlikely that testing and reporting strategies alone are able to produce epidemic waves with a very strong support from the proposed epidemic wave indicator. The calculation of bootstrap intervals (which should be interpreted as reference intervals rather than traditional confidence limits) can be helpful for assessing data-related uncertainty of the calculations. Still, this issue deserves further exploration. Another challenge is posed by the question, which time horizon should be used to calculate the wave indicator. In the examples, indicators with a longer time horizon (two and three months) seem to have worked better and more clearly at detecting epidemic waves. However, the choice is likely to depend on the characteristics of the waves, of which description the measure is intended to use. For epidemics with an annual periodicity of major waves, a time horizon of several months might be appropriate. However, until clearer guidance is available, I suggest using multiple timeframes, like it was done in the present study. An interesting characteristic of the proposed measure that it can also be used to detect phases of exponential decline in new case counts, which was not followed upon in the present study and did not have received much attention in general yet. Future modelling and empirical studies may explore whether an exponential rather than linear decline may provide valuable information regarding epidemic dynamics. Given that even central epidemiological concepts lack a consensual definition, 12, 13 thinking about epidemic waves formally as trends with specific characteristics in time series data may be a fruitful perspective. 14 Although the proposed measure is intended to be a descriptive indicator of epidemic waves, testing its value for prediction might be an interesting avenue of research. In addition, analyzing its agreement with similar measures, such as the average of the effective reproduction umber R across a defined period of time, 3 could be an informative focus of future studies. Even though the presented measure is approximate, relies on simplified assumptions, and needs further evaluation, it may contribute to putting discussions on epidemic waves on a more objective basis. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2022. ; https://doi.org/10.1101/2022.05.04.22274677 doi: medRxiv preprint Author contributions: LK conceptualized and designed study, prepared the data and the software code, performed the analyses, interpreted the results, and wrote the manuscript Data availability: The dataset was derived from sources in the public domain: World Health Organization Coronavirus Disease (COVID-19) Dashboard, https://covid19.who.int. No funding was received for conducting this study. The author has no relevant financial or non-financial interests to disclose. Ethics approval: This is a secondary analysis of anonymized aggregate data. No ethical approval is required. The custom code is attached as supplementary material. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2022. ; https://doi.org/10.1101/2022.05.04.22274677 doi: medRxiv preprint Centre for Evidence-Based Medicine. Covid 19 -Epidemic 'Waves A second wave? What do people mean by COVID waves? -A working definition of epidemic waves South Korea says it has a second wave of coronavirus infections -but what does that really mean? Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China Patterns of the COVID-19 pandemic spread around the world: exponential versus power laws Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola Bayesian model selection in social research A practical solution to the pervasive problems of p values R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard Defining outbreak: breaking out of confusion When is an epidemic an epidemic? Assessing the strength of case growth trends in the coronavirus pandemic . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2022. ; https://doi.org/10.1101/2022.05.04.22274677 doi: medRxiv preprint