key: cord-0427805-i6b1f2h2 authors: Parag, K. V.; Thompson, R. N.; Donnelly, C. A. title: Are epidemic growth rates more informative than reproduction numbers? date: 2021-04-20 journal: nan DOI: 10.1101/2021.04.15.21255565 sha: cfcd0140784b56318f2e6f6b0e26bac022e45ff7 doc_id: 427805 cord_uid: i6b1f2h2 Summary statistics, often derived from simplified models of epidemic spread, inform public health policy in real time. The instantaneous reproduction number, Rt , is predominant among these statistics, measuring the average ability of an infection to multiply. However, Rt encodes no temporal information and is sensitive to modelling assumptions. Consequently, some have proposed the epidemic growth rate, rt , i.e., the rate of change of the log-transformed case incidence, as a more temporally meaningful and model-agnostic policy guide. We examine this assertion, identifying if and when estimates of rt are more informative than those of Rt . We assess their relative strengths both for learning about pathogen transmission mechanisms and for guiding epidemic interventions in real time. these statistics. It measures the average number of secondary infections generated per effective primary case at that time. Policy decisions such as the imposition or release of interventions are often based on whether R t is larger or smaller than 1, which signifies that the epidemic is growing or waning, respectively (Anderson et al., 2020) . However, R t encodes no temporal information. A value of R t = 2, for example, indicates approximate epidemic doubling but not how quickly that doubling occurs. Moreover, because inference of R t depends on the model used (and hence its assumptions), differing estimates may be obtained from the same data, complicating the interpretation of R t as a signal for epidemic response (Lloyd, 2009; . Consequently, the instantaneous epidemic growth rate, r t , defined as the rate of change of the log-transformed case incidence, has been proposed as a more informative and understandable measure of transmission dynamics (Pellis et al., 2020) . Growth rates may be estimated directly from the gradient of the log-transformed observed incidence curve, have a natural temporal interpretation as the speed of case accumulation and still encode key dynamics e.g., the sign of r t and R t − 1 signify similar transmission trends. Estimates of r t can therefore, seemingly, be derived independently of an epidemic model. However, if a model is assumed, r t and R t are often bijectively related, meaning that there is a one-to-one correspondence between them. Thus, R t may provide no more information about transmission patterns than that available already from r t (Wallinga and Lipsitch, 2007) . While these observations may at first recommend r t as the more useful measure for policymaking, there are implicit complications. First, when comparing transmission across different spatial scales, epidemic phases or even data types (e.g., hospitalisations or cases), a non-dimensional parameter may be more useful. A value of R t = 2 has the same interpretation of a primary case generating two secondary ones on average, regardless of the region studied or the phase of the epidemic considered, with important implications for interventions (e.g. if R t = 2, then more than half of transmissions must be prevented for the epidemic to start declining). Second, the process of estimating the logarithmic derivative of a noisy incidence curve is not trivial and noise-smoothing choices may actually be equivalent to modelling assumptions. Third, information encoded in R t may be more easily leveraged to develop other useful outbreak analytics, such as probabilities of epidemic elimination or herd immunity thresholds (see Discussion and Hethcote (2000) ). Last, biases and delays in reporting and surveillance may have differing impacts on estimates of both quantities, making it unclear which offers the higher fidelity view of transmission (Lloyd, 2009) . In this paper, we investigate and discuss the various complexities and subtleties mediating the practical informativeness of estimates of both R t and r t , which we denoteR t andr t , respectively. We outline how these quantities can be computed from incidence curves using renewal models and smoothing filters. This leads us to our main result: that the smoothing assumptions inherent in obtainingr t from noisy incidence curves can be in some senses equivalent to the epidemiological ones necessary for obtainingR t . Consequently, we conclude that the question of whether R t or r t is more informative for real-time public health policymaking depends on the relative accuracy of the epidemiological assumptions and on how well the subtleties and uncertainties underlying each summary statistic are communicated. Estimates of R t and r t in combination, alongside contextual information about the ongoing epidemic, will provide the most complete picture of pathogen transmission and control. Inferring the time-varying transmissibility of a pathogen from routinely available surveillance data is vital to assess ongoing and upcoming trends in epidemic dynamics. Among the most common data types is the incidence curve, which represents the time-series of of new cases. We use I t to denote the incidence at time t and let w j be the . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 probability that a primary case takes j time units (usually in days) to generate a secondary case. The set of w j for all j constitutes the generation time distribution of the disease, where we make the common assumption that the generation time distribution is approximated by the serial interval distribution (Wallinga and Teunis, 2004; Cori et al., 2013) . The serial interval distribution describes the times between symptom onsets for primary and secondary cases and is often computed from independent line-list data (Thompson et al., 2019) . We assume that the set of w j has been well characterised for the infectious disease of interest. The renewal model (Fraser, 2007) relates the instantaneous reproduction number at time t , R t , to the incidence curve and generation time distribution as in Eq. (1) with ¾[I t ] indicating the mean of I t . Typically, an assumed distribution (e.g. Poisson or negative binomial) is used to statistically relate this mean to I t , and estimates of R t (i.e.R t ) are obtained using various Bayesian or maximum likelihood computational approaches. The total infectiousness, Λ t , summarises how past incidence propagates forwards in time by incorporating knowledge of the generation time distribution via a convolution. Many approaches exist for inferringR t from the incidence curve {I 1 , I 2 , . . . , I T } with T as the last observed time (see (Anderson et al., 2020) for more details). These estimates,R t , are increasingly employed for tracking transmissibility during epidemics and guiding public health responses. The instantaneous growth rate, r t , has been used less frequently to assess transmissibility over time but has recently gained attention as an alternative to R t (Pellis et al., 2020; Dushoff and Park, 2021) and is among the metrics that COVID-19 advisory bodies track. The quantityr t can be derived from {I 1 , I 2 , . . . , I T } without additional epidemiological knowledge or assumptions e.g., no estimated generation time distribution is required. Instead, the logarithmic derivative of some smoothed version of the incidence, S t , can is used, as in the left side of Eq. (2). There are various ways of obtaining suitable S t curves (e.g. using splines or moving average filters (Pellis et al., 2020) ). We can unify many of these approaches within the framework of Savitzky-Golay (SG) filters (Savitzky and Golay, 1964) . SG filters, with dimension m and coefficients a j , perform least-squares polynomial smoothing via the convolution in the right side of Eq. (2). We can realise a standard moving average filter by setting each a j = 1 /m, for example. The reproduction numbers and growth rates we consider should not be confused with the basic reproduction number, R 0 , and the intrinsic growth rate, r , which can be estimated using numerous methods (e.g., via compartmental or Richards' growth models (Yan and Chowell, 2019)) from various data sources (e.g., prevalence or cumulative case data). While these are related to our R t and r t during the earliest phases of an epidemic, R 0 and r cannot track timevarying changes in transmissibility. Instead they provide insight into initial epidemic growth upon invasion (Anderson et al., 2020) and so we do not examine them. Note that all methods we detail in this section assume that the population is well mixed and that contacts are random (White et al., 2020) . Next, we clarify how R t and r t are related. Eq. (1) and Eq. (2) describe simple and general approaches to estimating R t and r t from an incidence curve. While a known generation time distribution or serial interval is assumed in Eq. (1) when determiningR t , Eq. (2) neither makes . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.15.21255565 doi: medRxiv preprint mechanistic assumptions nor requires additional data for calculatingr t . However, if the assumptions in Eq. (1) are made then it is possible to derive a model-dependentr t fromR t . The generalised method for connecting these two summary statistics is given in the left side of Eq. (3) (Wallinga and Lipsitch, 2007) , with w denoting the moment generating function about the generation time distribution (defined by the set of w j from Eq. (1)). The left side of Eq. (3) states that the relationship betweenr t andR t depends strongly on the parametric form of the generation time (or in practice, the serial interval) distribution. The set of w j is most commonly parametrised from the gamma family of distributions with shape and scale parameters a and b. This leads to the analytic expression on the right side of Eq. (3). Although the moment generating approach suggests a general way of connectingr t andR t , there is an implicit exponential growth or decay assumption within this formula (Wallinga and Lipsitch, 2007) . While Eq. (3) is developed for a general renewal model framework, we can also specialise this method to popular compartmental models. For example, under a linearised compartmental Susceptible-Infectious-Recovered (SIR) model, we obtain as the mean generation time (Bettencourt and Ribeiro, 2008) . We examine how the model-basedr t relates toR t under a given generation time distribution (see Methods). The gamma and SIR simplifications of Eq. (3) provide key insights into the relative informativeness of these statistics. First, we see that the sign ofR t − 1 andr t are equivalent, making either equally good for inferring the transitions between growing and declining epidemics. We illustrate this for a simulated epidemic in Fig. 1 , which has been constructed to represent seasonal transmission dynamics. We computeR t using the EpiFilter method (Parag et al., 2020) (red, A) and validate our estimates with one-step-ahead incidence predictions (red, B) as in . The intersections ofR t (red, A) with 1 and those of the model-basedr t (red, C) with 0 coincide, as expected from Eq. (3). Both provide consistent assessments of time-varying transmission, correctly signalling rising and falling seasons. We next compute the model-agnostic, log-derivative basedr t using an SG filter as in Eq. (2), which effectively fits splines to the incidence curve. This estimate (grey, C) correlates well with our model-based one (red, C), with some overshoot in periods where incidence is small (and estimation known to be more difficult ). The only assumptions made in obtaining this estimate relate to how we smooth the data to obtain stable log-derivatives (e.g., we have to make choices about the order of our splines or the dimension of our moving filters). Current approaches to deriving model-agnosticr t values must all ultimately make similar assumptions and choices (Pellis et al., 2020) . Having made these key observations, our main result emerges. Comparing Eq. (1) and Eq. (2), we see that the total infectiousness, Λ t , is an implicit SG filter, with the value of m determined by the support of the generation time distribution. Hence, we construct another growth rate estimate, r t ≈ d log Λt dt as shown in Fig. 1 (blue, C) . This estimate matches the other two growth rate estimates well but with a decreased overshoot. This correspondence is novel and, importantly, clarifies how time-varying model-agnostic growth rates and instantaneous reproduction numbers relate by exposing that the generation time distribution is effectively an epidemiologically informed smoothing filter. We confirm this by comparing Λ t and S t (grey and blue, D). These are written Λ t −τ and S t −τ to indicate that they have been shifted to remove lags, τ, which naturally result from applying smoothing filters. Generally, τ is related to the mean generation time, and we do not provide confidence intervals for the two SG-basedr t here as we simply intend these results to demonstrate proof-of-concept. . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.15.21255565 doi: medRxiv preprint F I G U R E 1 Instantaneous reproduction numbers and growth rates. We simulate a seasonally varying epidemic with incidence I t , according to the renewal model with true transmissibility R t and serial interval distribution estimated for Ebola virus from (Van Kerkhove et al., 2015) . In panels A and B, we estimate the instantaneous reproduction numberR t (with 95% confidence intervals) using EpiFilter (see (Parag et al., 2021) ) and provide one-step-ahead predictionsÎ t usingR t . In panels C and D we derive three growth rate estimates,r t using:R t (via the (Wallinga and Lipsitch, 2007) approach), a smoothed-shifted version of the incidence curve S t −τ (via SG filters) and a shifted version of the total infectiousness of the epidemic Λ t −τ by treating it as a type of SG filter. Evaluating time-varying changes in pathogen transmissibility is an important challenge, allowing the impact of public health interventions to be assessed and providing indicators that can inform policymaking during epidemics. We have focussed on two key metrics for tracking transmissibility: the instantaneous reproduction number R t (with estimatê R t ) and the instantaneous growth rate r t (with estimater t ). Both metrics provide key insights into the dynamics of epidemics as demonstrated by their use during the COVID-19 pandemic (Anderson et al., 2020; Abbott et al., 2020) . However, their relative merits and demerits have been increasingly debated. Recent work has suggested that the benefits of inferringr t might have been underappreciated, and that this quantity may be particularly useful because . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101 doi: medRxiv preprint of its apparent independence from modelling assumptions and its explicit consideration of the epidemic speed (i.e. it more naturally includes temporal information) (Pellis et al., 2020; Dushoff and Park, 2021) . Here, we have investigated and exposed the relationship betweenR t andr t . The relative informativeness of these two quantities during epidemics rests on the reliability of their smoothing and epidemiological assumptions. We found that bothR t andr t extract signals of changing pathogen transmission by smoothing noise from the incidence curve. As shown in Fig. 1 , their key difference lies in the kernel (i.e., the set of weights in the SG filter of Eq. (2)) used for this smoothing. Specifically, computingr t in a model-agnostic way corresponds to selecting an arbitrary kernel, whereas calculatingR t (and, correspondingly, model-basedr t values) involves implicitly treating the generation time distribution as an epidemiological kernel (see Results and the right sides of Eq. (1) and Eq. (2)). As a result, if the generation time distribution is estimated accurately and underlying assumptions about pathogen transmission hold, then not only are both measures closely related, with the commonly cited R t = 1 threshold corresponding to an r t = 0 threshold, but R t is also theoretically more informative. This follows becauseR t can be used to derive correct model-basedr t values, while also providing additional insights into the mechanism of transmission underlying the observed incidence (Yan and Chowell, 2019). In contrast, starting from the model-agnosticr t , it does not seem possible to deriveR t without epidemiological assumptions. However, should the generation time distribution be misspecified, thenR t could be biased, in which case the model-agnosticr t might be more informative. When constructingR t , the generation time distribution is often approximated by the serial interval distribution. Misspecification of the generation time as described above might arise due to the often limited number of observed serial intervals used to estimate the serial interval distribution. Observed serial intervals are commonly obtained from household or contact tracing studies, where it is possible to identify source-recipient transmission pairs (Cowling et al., 2009; Li et al., 2020) . However, as case numbers increase, identifying known source-recipient pairs becomes more challenging since there is less certainty about the source of a given transmitted infection and as the risk of infection from an unknown source in the community cannot be ignored. Moreover, even if sufficient source-recipient pairs are reliably known, the generation time may still be misspecified. Non-pharmaceutical interventions and public health measures, such as case isolation after symptom onset, may curtail observed serial intervals (Ali et al., 2020) or increase the proportion of cases caused by pre-symptomatic transmission (Sun et al., 2021) . In both scenarios it becomes difficult to reliably approximate the generation time distribution with the serial interval distribution (which is also now time-varying and may even have negative values). While recent approaches try to compensate for some of these issues (Ganyani et al., 2020) or allow the inclusion of up-to-date distributions (Thompson et al., 2019) , accurately relatingr t toR t may not always be simple in practice. Despite potential issues when obtainingR t , we have made clear that inferring the model-agnosticr t also requires assumptions related to smoothing of the incidence curve (or log incidence curve) and specification of the time interval over which to estimate a particularr t . Furthermore, when case numbers are increasing,r t does not give an indication of the proportion of current transmissions that must be blocked to prevent an epidemic from continuing to grow. This proportion relative to R 0 is known as the herd immunity threshold. This threshold is used to determine the vaccine coverage required in order to control transmission, accounting for vaccine effectiveness and any infection-acquired immunity (Hethcote, 2000; Thompson et al., 2020) . On the other hand, it isr t that naturally gives estimated doubling times (or halving times), which may be important for intervention planning purposes. There are also a number of factors that limit the informativeness of bothR t andr t . First, reporting errors and delays can lead to imprecise case counts, affecting summary statistics derived from incidence curves (Azmon et al., 2014) . Second, both of the statistics discussed here relate to averages, but heterogeneous systems with superspreading individuals or events (Lloyd-Smith et al., 2005) require more than a measure of central tendency to be well understood. Inferring pathogen transmissibility and the potential impacts of interventions therefore often requires more complex . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101/2021.04.15.21255565 doi: medRxiv preprint modelling approaches. Last, it is not onlyr t that requires time windows to be chosen for estimation. Values ofR t are often calculated over shifting time windows. Short windows may lead to fluctuatingR t values that potentially reflect randomness in contacts between hosts rather than variations in transmissibility, while long windows may blur detection of key variations (Cori et al., 2013; . The above problems relate to fundamental bias-variance tradeoffs in the inference of r t and R t and emphasise that neither measure should be used naively. As highlighted in (Lloyd, 2009) , sensitivity analyses of the structure of the epidemiological model or statistical procedure used are crucial for drawing reliable inferences from noisy data. It should also be noted that even if these problems do not exist, other contextual information is still often required to obtain a full picture of an ongoing epidemic. For example, while R t = 1 (or equivalently r t = 0) may indicate a stable epidemic, the policy response may be very different depending on whether incidence is high or low. The first of these scenarios may not be acceptable to policymakers, as it involves large numbers of infections in the near future. Both R t and r t only provide information about the changes in state of an epidemic. Nonetheless, despite some of the challenges in estimation and the need for contextual information, we contend that bothR t andr t are valuable. Estimates of R t (widely referred to as the "R number") are particularly useful as an intuitive measure for public communication, allowing the effects of current interventions to be assessed and communicated straightforwardly. However, estimates of r t , expressed as doubling times, are great for expressing the speed at which cases are increasing. Given the risks of depending on either R t or r t that we have explored in this paper and the complementary roles they can play in raising public awareness, we support current efforts to generate estimates of both summary statistics. These quantities in combination and together with contextual measures such as current incidence or prevalence, allow epidemic dynamics to be understood more clearly and completely. . CC-BY-NC-ND 4.0 International license It is made available under a granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has (which was not certified by peer review) copyright holder for this preprint The this version posted April 20, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Temporal variation in transmission during the COVID-19 outbreak Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions Reproduction number (R) and growth rate (r) of the COVID-19 epidemic in the UK: methods of estimation, data sources, causes of heterogeneity, and use as a guide in policy formulation Infectious diseases of humans: dynamics and control On the estimation of the reproduction number based on misreported epidemic data Real time Bayesian estimation of the epidemic potential of emerging infectious diseases A new framework and software to estimate time-varying reproduction numbers during epidemics Estimation of the serial interval of influenza Speed and strength of an epidemic intervention Estimating individual and household reproduction numbers in an emerging epidemic Estimating the generation interval for coronavirus disease (covid-19) based on symptom onset data The mathematics of infectious diseases Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Sensitivity of model-based epidemiological parameter estimation to model assumptions Superspreading and the effect of individual variation on disease emergence Deciphering early-warning signals of the elimination and resurgence potential of SARS-CoV-2 from limited data at multiple scales. medRxiv Using information theory to optimise epidemic models for real-time prediction and estimation An exact method for quantifying the reliability of end-of-epidemic declarations in real time Challenges in control of COVID-19: short doubling times and long delay to effect of interventions Smoothing and differentiation of data by simplified least squares procedures Transmission heterogeneities, kinetics, and controllability of sars-cov-2 Key questions for modelling COVID-19 exit strategies Improved inference of time-varying reproduction numbers during infectious disease outbreaks A review of epidemiological parameters from Ebola outbreaks to inform early public health decision-making How generation intervals shape the relationship between growth rates and reproductive numbers Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Statistical estimation of the reproductive number from case notification data