key: cord-0319711-dfnrdq6e authors: Parag, K. V.; Donnelly, C. A.; Zarebski, A. E. title: Quantifying the information in noisy epidemic curves date: 2022-05-16 journal: nan DOI: 10.1101/2022.05.16.22275147 sha: 653bc2fd9a713846f562edb5f5b1eddfe007000a doc_id: 319711 cord_uid: dfnrdq6e Reliably estimating the dynamics of transmissible diseases from noisy surveillance data is an enduring problem in modern epidemiology. Key parameters, such as the time-varying reproduction number, Rt at time t, are often inferred from incident time series, with the aim of informing policymakers on the growth rate of outbreaks or testing hypotheses about the effectiveness of public health interventions. However, the reliability of these inferences depends critically on reporting errors and latencies innate to those time series. While studies have proposed corrections for these issues, methodology for formally assessing how these noise sources degrade Rt estimate quality is lacking. By adapting Fisher information and experimental design theory, we develop an analytical framework to quantify the uncertainty induced by under-reporting and delays in reporting infections. This yields a novel metric, defined by the geometric means of reporting and cumulative delay probabilities, for ranking surveillance data informativeness. We apply this metric to two primary data sources for inferring Rt: epidemic case and death curves. We show that the assumption of death curves as more reliable, commonly made for acute infectious diseases such as COVID-19 and influenza, is not obvious and possibly untrue in many settings. Our framework clarifies and quantifies how actionable information about pathogen transmissibility is lost due to surveillance limitations. The time-varying effective reproduction number, denoted R t at time t, is an important and popular measure of the transmissibility of an unfolding infectious disease epidemic [1] . This parameter defines the average number of secondary infections generated by a primary case at t, providing a critical threshold for delineating growing epidemics (R t > 1) from those likely to become controlled (R t < 1). Estimates of R t derived from surveillance data are widely used to evaluate the efficacies of interventions [2, 3] (e.g., lock-downs), forecast upcoming disease burden [4, 5] (e.g., hospitalisations), inform policymaking [1] and improve public situational awareness [6] . The reliability of these estimates depends fundamentally on the quality and timeliness of available surveillance data. Practical epidemic monitoring is subject to various errors or imperfections that can obscure or bias inferred transmission dynamics [7] . Prime among these are under-reporting and reporting delays, which can scale and smear R t estimates, potentially misinforming public health authorities [8, 9] . The ideal data source for estimating R t is the incident time series of infections, I t . Unfortunately, as infections are rarely observed directly, the epidemic curve of reported cases, C t , or that of death counts, D t , is commonly used as a proxy [10, 11] . Both provide noisy approximations of the unknown I t but with different and important relative advantages. The epidemic case curve C t records the most routinely available data i.e., counts of new cases [12] , but is often limited by delays and under-reporting. Ascertainment delays smear or reorder case incidence and may emerge from fixed surveillance capacities, weekend effects and lags in diagnosing symptomatic patients [8, 13] . Delays may be classed as occurred but not yet reported (OBNR), when the source times of delayed cases eventually become known (i.e., delays basically cause right censoring of the case counts), or what we term as never reported (NEVR), when source times are not uncovered [14, 15] . Case under-reporting or under-ascertainment strongly distorts the true, but unknown, infection incidence curve, altering its size and shape [9, 16] . Temporal fluctuations in testing, behaviour-linked reporting (e.g., based on the severity of symptoms) [17] , detection bottle-necks and other surveillance biases can lead to underascertainment and inconsistent reporting [18] . Constant reporting (CONR) describes when the case detection fraction or probability is stable. We use the term variable reporting (VARR) for the more realistic scenario where this probability varies appreciably with time. Death time series, D t , count newly reported deaths attributable to the pathogen being studied and are also subject to under-reporting and reporting delays, but with two key differences [10] . First, death reporting delays incorporate an extra lag for the intrinsic time it takes an infection to culminate in mortality (this also subsumes hospitalisation periods). Second, apart from the underreporting fraction of deaths, there is another scaling factor known as the infection fatality ratio, which defines the proportion of infections that result in mortality [2, 11] . We visualise how the noise types underlying epidemic and death curves distort infection incidence in Fig. 1 . Although the influences of surveillance latencies and under-ascertainment fractions on key parameters, such as R t , are well known [8, [19] [20] [21] and much ongoing work attempts to compensate for these noise sources [10, [22] [23] [24] , there exists no formal framework for assessing and exposing how they inherently limit information available for estimating epidemic dynamics. Most studies utilise simulation-based approaches (with some exceptions e.g., [9, 20] ) to characterise surveillance defects, which while invaluable, preclude generalisable insights into how epidemic monitoring shapes parameter inference. Here we develop one such analytic framework. Using Fisher information theory we derive a measure of how much usable information an epidemic time series contains for inferring R t at every time. This yields metrics for cross-comparing different types of surveillance time series as we are able to explicitly quantify how underreporting (both CONR and VARR) and reporting delays (exactly for OBNR with a tight upper bound for NEVR) degrade available information. As this metric only depends on the properties of surveillance (and not R t or I t ) we extract simulation-agnostic insights into what are the least and most detrimental types of surveillance noise. We prove for constrained mean reporting fractions and delays, that it is preferable to minimise variability among reporting fractions but to maximise the heterogeneity of the reporting delay distribution (such that a minority of infections face large delays but the majority possess short lags to notification). This proceeds from standard experimental design theory applied to our metric, which shows that the information embedded within an epidemic curve depends on the product of the geometric means of the reporting fractions and cumulative delay probabilities corrupting that curve. This central result also provides a Fig. 1 : Under-reporting and delayed reporting noise. We simulate true infection incidence I t (black) from a renewal model (Eq. (1) with Ebola virus dynamics) with reproduction number R t that switches from supercritical to subcritical spread due to some intervention. Panel A shows under-reported case curves (50 realisations, various colours) with reporting fractions sampled from the distribution in the inset. We observe stochastic trajectories and appreciable under-counting of peak incidence. Panel B considers delays in case reports (50 realisations, various colours) from the distribution plotted in the inset. We find variability and a smearing of the sharp change in incidence due to R t (also provided as an inset). The main question of this study is how do we quantify which of these two scenarios incurs a larger loss of the information originally available from I t , ideally without simulation. non-dimensional score for summarising and ranking the reliability of (or uncertainty within) different surveillance data for inferring pathogen transmissibility. Last, we apply this framework to explore and critique a common claim in the literature, which asserts that death curves are more robust for inferring transmissibility than case curves. This claim, which as far as we can tell has never been formally verified, is usually made for acute infectious diseases such as COVID-19 and pandemic in-fluenza [2, 11] , where cases are severely under-reported, with symptom-based fluctuations in reporting. In such settings it seems plausible to reason that deaths are less likely under-counted and more reliable for R t inference. However, we input COVID-19 reporting rate estimates [18, 25] within our metric and discover few instances in which death curves are definitively more informative than case counts. While this may not rule out that possibility, it elucidates and exposes how different noise terms within both data sources corrupt information, providing new methodology for exploring these types of questions more precisely. We also outline how other common data such as hospitalisations, prevalence and wastewater virus surveys conform to our framework. Hopefully the tools we developed here will improve quantification of noise and information and highlight the areas where enhanced surveillance strategies can maximise impact. The renewal model [26, 27] is a popular approach for describing how infections dynamically propagate during the course of an epidemic. The number of newly infected cases at time t, I t , depends on the effective reproduction number, R t , which counts the new infections generated per infected individual (on average) and the total infectiousness, Λ t , which measures how many past infections (up to time t−1) will effectively produce new ones. This measurement weighs past case counts by the generation time distribution, w, which we assume to be known. We define w s as the probability that it takes s time units for a primary case to generate a secondary case. The distribution is then w = w ∞ 1 := {w 1 , w 2 , . . . , w ∞ }. The statistical relationship between these variables is often defined as in Eq. (1) with symbol Pois specifying a Poisson distribution [19] . This relationship strictly holds when I t is perfectly observed both in size (no underreporting) and in time (no delays in reporting). Eq. (1) has been widely used to model transmission dynamics of many infectious diseases, including COVID-19 [1] , influenza [28] and Ebola virus disease [29] . A common and important problem in infectious disease epidemiology is the estimation of the latent variable R t in real time from the observed incidence curve of infections. If this time series persists over 1 ≤ t ≤ τ , with τ as the present time, then we want to estimate the vector of parameters R τ 1 : We assume that time is measured in appropriate units such that R t is expected to change (independently) at every time step. As illustrations, this time-step can be weekly for COVID-19 or Severe Acute Respiratory Syndrome (SARS) [2, 19] but monthly for diseases like rabies [30] . Following the development in [28, 31] , we solve this inference problem by constructing the incidence loglikelihood function (R τ 1 ) = log P(I τ 1 | R τ 1 ) as in Eq. (2) with K τ as a constant that does not depend on any R t . We compute the maximum likelihood estimate (MLE) of R t asR t , which is the maximal solution of ∂ (R τ 1 ) 26] . Repeating this for all t we obtain estimates of the complete vector of transmissibility parameters R τ 1 underlying I τ 1 . To quantify the precision (the inverse of the variance, var) around these MLEs or any unbiased estimator of R t we calculate the Fisher information (FI) that is taken across the data I τ 1 (hence the subscript I). The FI defines the best (smallest) possible uncertainty asymptotically achievable by any unbiased estimate,R t . This follows from the Cramer-Rao bound [32] , which states that var(R t ) ≥ F I (R t ) −1 . The confidence intervals aroundR t converge toR t ± 1.96 F I (R t ) − 1 2 . The FI also links to the Shannon mutual information that I τ 1 contains about R t [33, 34] and is pivotal to describing both model identifiability and complexity [32, 35] . Using the Poisson renewal log-likelihood in Eq. (2) we compute the FI as the left equality in Eq. (3) . Observe that this depends on the unknown 'true' R t . This reflects the heteroscedasticity of Poisson models where the estimate mean and variance are co-dependent. We construct a square root transform that uncouples this dependence [31] , yielding the right formula in Eq. (3). We can evaluate F I (2 √ R t ) purely from I τ 1 . The result follows from the Fisher information change of variables formula F I (R t ) = F I (R t ) ∂Rt ∂Rt 2 [32] . This transformation has several optimal statistical properties [36, 37] and so we will commonly work with R t := 2 √ R t . As we are interested in evaluating the informativeness or reliability of the entire I τ 1 time series for inferring transmission dynamics we require the total FI it provides for all estimable reproduction numbers, R τ 1 . As we noted above, the inverse of the square root of the FI for a single R t corresponds to an uncertainty (or confidence) interval. Generalising this to multiple dimensions yields an . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint uncertainty ellipsoid with volume inversely proportional to the square root of the determinant of the FI matrix [35, 36] . This matrix has diagonals given by F I (R t ) and off-diagonals defined as E[− ∂ 2 (R τ 1 ) ∂RtRs ] for 1 ≤ t, s ≤ τ . Maximising this non-negative determinant, which we denote the total information T(I τ 1 ) from the data I τ 1 , corresponds to what is known as a D-optimal design [38] . This design minimises the overall asymptotic uncertainty around estimates of the vector R τ 1 . As the renewal model in Eq. (1) treats every R t as independent, off-diagonal terms are 0 and T(I τ 1 ) is a product of the diagonal FI terms. Transforming R t → R t we then obtain Eq. (4). If we work directly in R t we get τ t=1 Λ instead. In two dimensions (i.e., τ = 2) our ellipsoid becomes an ellipse and Eq. (4) intuitively means that its area is proportional to a product of lengths F I (R 1 ) − 1 2 F I (R 2 ) − 1 2 , which factors in the uncertainty from each estimate. We will use this recipe of formulating a log-likelihood for R τ 1 given some data source and then computing the total information, T(.), it provides about these parameters to quantify the reliability of case, death and other time series for inferring transmissibility. Comparing data source quality will involve ratios of these total information terms. Metrics such as Eq. (4) are valuable because they measure the usable information within a time series and also delimit the possible distributions that a model can describe given that data (see [35, 39] for more on these ideas, which emerge from information geometry). Transforms like R t = 2 √ R t stabilise these metrics (i.e., maximise robustness) to unknown true values [36, 37] . We investigate two important and common sources of noise, under-reporting and reporting delay, which limit our ability to precisely monitor I τ 1 , the true time series of new infections. We quantify how much information is lost due to these noise processes by examining how these imperfections degrade T(I τ 1 ), the total information obtainable from I τ 1 under perfect (noiseless) surveillance for estimating parameter vector R τ 1 (see Eq. (4)). Fig. 1 illustrates how these two main noise sources individually alter the shape and size of incidence curves. (i) Under-reporting or under-ascertainment. Practical surveillance systems generally detect some proportion of the true number of cases occurring at any given time t. If this proportion is ρ t ≤ 1 then the number of cases, C t , observed is generally modelled as C t ∼ Bin (I t , ρ t ) [22, 40] , where Bin indicates the binomial distribution. The under-reported fraction is 1 − ρ t and so the reported case count C t ∼ Pois (ρ t Λ t R t ). Reporting protocols are defined by choices of ρ t . Constant reporting (CONR) is the simplest and most popular, assuming every ρ t = ρ [19] . Variable reporting (VARR) describes general timevarying protocols where every ρ t can differ [41] . (ii) Reporting delays or latencies. There can be notable lags between the onset of an infected case and when it is reported [14] . If δ defines the the distribution of these lags with δ x as the probability of a delay of x ≥ 0 time units, then the cases reported at t, C t , sums the cases actually occurring at t but not delayed and those from previous days that were delayed [10] . This is commonly modelled as C t ∼ Pois [11, 41] and means that true incidence I t splits over future times as ∼ Mult(I t , δ), where Mult indicates multinomial [13] . The C t time series is observed but not yet reported (OBNR) if we later learn about the past I t splits (rightcensoring), else we say data are never reported (NEVR). We make some standard assumptions [8, 12, 19, 41] in incorporating the above noise sources within renewal model frameworks. We consider stationary generation time and reporting delay distributions only i.e., w and δ do not vary with time and we neglect co-dependencies between reporting and transmissibility. Additionally, we assume these distributions and all reporting or ascertainment fractions, ρ t , are inferred or known from other data (e.g., contact tracing studies) [13] . In the absence of these assumptions R τ 1 would be non-identifiable. We next examine how (i)-(ii) in combination limit the information available about epidemic transmissibility. We denote the empirically observed or reported number of cases at time t, subject to noise from both under-reporting and reporting delays, as C t with C τ 1 := {C t : 1 ≤ t ≤ τ } as the epidemic case curve. This curve is obtained from routine outbreak surveillance and is a corrupted version of the true incidence I τ 1 [10] , modelled by Eq. (1). These noise sources (see Methods for statistical descriptions) are parametrised by reporting fractions ρ τ 1 and a delay distribution δ, which we assume to be known from other data (e.g. line-lists) [13, 42] . As a result, we can construct Eq. (5) as in [24] (see Methods). This noisy renewal model suggests that C t (unlike I t ) contains partial information about the entire time series of reproduction numbers for x ≤ t as mediated by delay . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint and reporting probabilities. Perfect reporting corresponds to ρ x = 1 for all x as well as δ 0 = 1 with δ x =0 = 0. The models in (i)-(ii) of the Methods are obtained by individually removing noise sources from Eq. (5) . Other practical epidemic surveillance data such as the time series of new deaths or hospitalisations conform to the framework in Eq. (5) either directly or with additional effective delay and under-reporting stages [11] . The main one we investigate here is the count of new deaths (due to infections) across time, which we denote D τ 1 : The death curve involves a reporting delay that includes the intrinsic lag from infection to death. We let γ represent the distribution of time to observed death and use σ τ 1 for the fraction of deaths that are reported. An important additional component when describing the chain from I τ 1 to D τ 1 is the infection fatality ratio, ifr t , which is the probability at time t that an infection culminates in a death event [10] . Fusing these components yields Eq. (6) as a model for death counts D t . In a later section we also explain how this description fits other data streams such as hospitalisations and prevalence. Some studies [2, 43] replace this Pois formulation with a negative binomially (NB) distributed one to model extra variance in these data streams. In the Appendix we show that this does not disrupt our subsequent results on the relative informativeness of surveillance data (though the NB formulation is less tractable and unsuitable for extracting generalisable, simulation-free insights). We derive the FI of our vector of effective reproduction numbers R τ 1 given the case curve C τ 1 , which follows Eq. (5). We initially assume that reporting delays are OBNR i.e., that we eventually learn the source time of cases at a later date. This corresponds to a right censoring that can be compensated for using nowcasting techniques [14] . Later we prove that this not only defines a practical noise model but also serves as an upper bound on the information available from NEVR delays, where the true timestamps of cases are unresolvable. Mathematically, the OBNR assumption lets us to decompose the sum in Eq. (5). We can therefore identify the component of C t that is informative about R x . This follows from the As we are interested in the total information that C τ 1 contains about every R t we collect and sum contributions from every C t . We can better understand this process by constructing the matrix Q in Eq. (7), which expands the convolution of the reporting fractions with the delay probabilities over the entire observed time series. We consider the vector µ = [µ τ , µ τ −1 , . . . , µ 1 ] with µ t = Λ t R t and denoting the transpose operation. Then as the mean of the reported case incidence at time t. The components of C τ 1 that contain information about every given reproduction number The elements of this vector are Poisson means formed by collecting and summing the components of C τ 1 that inform about [R τ , R τ −1 , . . . , R 1 ], respectively. Hence we obtain the key relationship in Eq. The ability to decompose the row or column sums from Q into the Poisson relationships of Eq. (8) is a property of the independence properties of renewal models and the infinite divisibility of Poisson formulations. Using Eq. (8) and analogues to Poisson log-likelihood definitions from the Methods we derive the Fisher information that C τ 1 contains about R t as in Eq. (9) . As in Eq. (3) we recompute the FI in Eq. (9) under the It is clear that under-reporting and delays can substantially reduce our information about reproduction numbers. As we might expect, if ρ t = 0 (no reports at time unit t) or F τ −t = 0 (all delays are larger than τ − t) then we have no information on R t at all from time series C τ 1 . The MLE,R t , also follows from Eq. is equivalent to applying a nowcasting correction as in [13, 14] . An important point to make here is that while such corrections can remove bias, allowing inference despite these noise sources, they cannot improve on the information (in this case Eq. (9)) inherently available from the data. This is known as the data processing inequality [44, 45] . If we cannot resolve the components of every C t from Eq. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint reporting delay is classed as NEVR (i.e., we never uncover case source dates). Hence we know Qµ but not Q µ. Accordingly, we must use Eq. (5) to construct an aggregated log-likelihood (R τ We ignore constants that do not depend on any R t in this likelihood. For every given where a t collects all terms that are not informative about that specific R t . Here s ≥ t simply indicates that information about R t is distributed across later times due to the reporting delays. We can then obtain the FI contained in C τ If we could decouple the interactions among the reproduction numbers then the b x terms would disappear and we would recover the expressions derived under OBNR delay types. Since b x is a function of other reproduction numbers, the overall FI matrix for R τ 1 is not diagonal (there are non-zero terms from evaluating However, we find that this matrix can be reduced to a triangular form with determinant equal to the product of terms (across t) in Eq. (11) . We show this for the example case of τ = 3 in the Appendix. As a result, the FI term for R t in Eq. (11) does behave like and correspond to that in Eq. (9) . Interestingly, as b x ≥ 0, Eq. (11) yields the revealing inequality t . This proves that OBNR delays upper bound the information available from NEVR delays. Last, we note that robust transforms cannot be applied to remove the dependence of Eq. (11) on the unknown R t parameters. The best we can do is evaluate Eq. (11) at the MLEsR t for all t. These MLEs emerge as the joint maxima of the set of coupled differential equations ∂ (R τ 1 ) Here sums start at t as they include only time points that contain information about R t . Expectation-maximisation algorithms, such as deconvolution approaches outlined in [10] , are viable means of computing these MLEs or equivalents. Note that the nowcasting methods used to correct for OBNR delays do not help here [13] . Having derived the FI for each effective reproduction number above, we now provide a measure of the total information that C τ 1 provides about R τ 1 or the transformed R τ 1 . As detailed in the Methods, this total information, T(C τ 1 ), relates inversely to the smallest joint uncertainty around unbiased estimates of all our parameters [35] . As larger T(C τ 1 ) implies reduced overall uncertainty, this is a rigorous measure of the statistical reliability of noisy data sources for inferring pathogen transmissibility. Use of this or related metrics for quantifying the information in noisy epidemic data is novel (as far as we can tell). We first consider the OBNR delay case under arbitrarily varying (VARR) reporting rates. Since the FI matrix under OBNR delays is diagonal, with each element given by Eq. (9), we can adapt Eq. (4) to derive Eq. (13) . Here we have applied the R t = 2 √ R t transformation to show that the total information in this noisy stream can be obtained without knowing R t . In the absence of this 1 is a distortion of the true infection incidence I τ 1 we normalise Eq. (13) by Eq. (4) to obtain a new reliability metric, η(C τ 1 ) := T(C τ 1 )T(I τ 1 ) −1 . This is given in Eq. (14) and valid under both R t and R t . We can relate this reliability measure to an effective (fixed) reporting fraction, θ(C τ 1 ), which causes an equivalent information loss. Applying Eq. (14) gives η(C τ 1 ) = θ(C τ 1 ) τ , which yields Eq. (15) . Here G(.) indicates the geometric mean of its arguments over 1 ≤ t ≤ τ . Eq. (15) is a central result of this work. It states that the total information content of a noisy epidemic curve is independently modulated by the geometric mean of its reporting fractions, G(ρ t ), and that of its cumulative delay probabilities, G(F τ −t ). Moreover, Eq. (15) provides a framework for gaining analytic insights into the influences of both noise sources from different surveillance data and for ranking the quality of those diverse data. Eq. (15) applies to OBNR delays exactly and upper bounds the reliability of epidemic curves with NEVR delays (see previous section). Tractable results for NEVR delays are not possible and require numerical computation of Hessian matrices of − log P(C τ 1 | R τ 1 ) (we outline log-likelihoods and other equations for a τ = 3 example . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint in the Appendix). However, we find that the Eq. (15) upper bound is tight for two fundamental settings. The first is under a constant or deterministic delay of d i.e., δ x=d = 1. Eq. (5) reduces to C t ∼ Pois(ρ t−d Λ t−d R t−d ). As each C t only informs on R t−d OBNR and NEVR delays are the same (and corrected by truncation). The second occurs when transmissibility is constant or stable i.e., R t = R for all t. We can sum Eq. (9) to get F C (R) = τ t=1 ρ t F τ −t Λ t R −1 for OBNR delays. We can calculate the FI for NEVR delays from Eq. (10), which admits a derivative ∂ (R) ∂R = τ t=1 C t R −1 −F τ −t ρ t Λ t and hence a FI and MLE that are precisely equal to those for OBNR delays. This proves a convergence in the impact of two fundamentally different delay noise sources and emphasises that noise has to be contextualised with the complexity of the signal to be inferred. Simpler signals, such as a stationary R that remains robust to the shifts and reordering of I τ 1 due to delays, may be notably less susceptible to fluctuations in noise probabilities. The metric proposed in Eq. (15) provides an original and general framework for scoring proxies of incidence (e.g., epidemic case curves, death counts, hospitalisations and others) using only their noise probabilities and without the need for simulations. We explore the implications of Eq. (15) both for understanding noise and ranking those proxies. The geometric mean decomposition allows us to separately dissect the influences of under-reporting and delays. We start by applying experimental design theory [36, 38] , to characterise the best and worst noise types for inferring effective reproduction numbers. We consider G(ρ t ), the geometric mean of the reporting probabilities across time. If we assume the average sampling fractionρ = 1 τ τ t=1 ρ t is fixed (e.g., by some overall surveillance capacity) then we immediately know from design theory thatρ = arg max ρ G(ρ t ). This means that of all the possible distributions of sampling fractions fitting that constraint, ρ, CONR or constant reporting with probabilityρ is the most informative [46] . This result is new but supports earlier studies recognising that CONR is preferred to VARR, although they investigate estimator bias and not information loss [9, 19] . Accordingly, we also discover that the worst sampling distribution is maximally variable. This involves setting ρ t ≈ 1 for some time subset S such that t∈S ρ t = τρ with all other ρ t ≈ 0 (we use approximate signs as we assume non-zero sampling probabilities). Relaxing this constraint, Eq. (15) presents a framework for comparing different reporting protocols. We demonstrate these ideas in Fig. 2, where ρ t ∼ Beta(a, b) i.e., each reporting fraction is a sample from a Beta distribution. Reporting protocols differ in (a, b) choices. We select 10 4 ρ t samples each from 2000 distributions with 10 −1 ≤ b ≤ 10 2 and a computed to fulfil the mean constraintρ. Fig. 2 : The information loss in under-reporting. We investigate the effective information metric (θ(C τ 1 )) for variable reporting strategies (VAR) with reporting fraction ρ t drawn from various Beta distributions. Panel A shows that while θ(C τ 1 ) depends on the mean reporting probability (ρ), large fluctuations in the total information can emerge from the level of variability, controlled here by the Beta distribution shape. This is indicated by the overlap among metric values with differentρ. The grey line (dashed) is the stable or constant reporting (CONR) protocol, which is optimal. Our metric decreases with the variance of the protocol (var(ρ t )) as seen in the inset with colours indicating a givenρ. Panel B illustrates the Beta sampling distributions and their resulting variance and metric scores (inset). The most variable reporting strategy (blue) is the worst protocol for a givenρ. Panel A of Fig. 2 shows that θ(C τ 1 ) generally increases with the mean reporting probabilityρ. However, this improvement can be denatured by the variance, var(ρ t ), of the reporting scheme (inset where each colour indicates the various schemes with a givenρ). The CONR scheme . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint is outlined with a grey line (dashed) and, as derived, is the most informative. Panel B confirms our theoretical intuition on how var(ρ t ) reduces total information with the extreme (worst) sampling scheme outlined above in blue and the most stable protocol in red. There are many ways to construct ρ t protocols. We chose Beta distributions because they can express diverse reporting probability shapes using only two parameters. Similarly we examine reporting delays via G(F τ −t ), the geometric mean of the cumulative delay or latency distribution across time. Applying a mean delay constraintδ = x≥0 xδ x = τ t=1 (1−F τ −t ) (e.g., reflecting operational limits on the speed of case notification), we adapt experimental design principles (by maximising a FI determinant our results are termed D-optimal) [46] . These suggest that max δ G(F τ −t ) is achieved by cumulative distributions possessing the largest δ 0 within this constraint. Delay distributions with significant dispersion (e.g., heavy tails) attain this optima while constant delays (where δ x≈δ = 1 and 0 otherwise) lead to the largest information loss under this constraint. This result may seem counter-intuitive as deterministic delays best preserve information outside of that delay and can be treated by truncating the observed epidemic time series (see previous section) e.g., for a fixed weekly lag we would ignore the latest week of data. However, this causes a bottleneck. No information is available for that truncated week eliminating any possibility of timely inference (and making epidemic control difficult [47] ). In contrast, a maximally heterogeneous delay only slightly lags the majority of infected cases at the expense of large latencies for a few cases. This ensures that, overall, we gain more actionable information about the time series. We illustrate this point (and relax the mean constraint) in Fig. 3 , where we verify the usefulness of Eq. (15) as a framework for comparing the information loss induced by delay distributions of various shapes and forms. We model δ as NB(k,δ δ+k ) with k describing the dispersion of the delay. Panel A demonstrates how our θ(C τ 1 ) metric varies with k (30 values taken between 10 −1 and 10 2 ) at various fixed mean constraints (3 ≤δ ≤ 30, each given as a separate colour). As suggested by the theory, we see that decreasing k (increasing dispersion or delay heterogeneity) improves information at any givenδ. The importance of both the shape and mean of reporting delays is indicated in the inset as well as by the number of distributions (seen as intersects of the dashed black line) that result in the same θ(C τ 1 ). Panel B plots corresponding cumulative delay probability distributions, validating our assertion from design theory that the best delays (blue, with metric in inset) are heterogeneous and force F τ −t high very early on (maximise δ 0 ), while the worst ones are more deterministic (red, larger k). These curves are for OBNR delays and upper bound the performance of NEVR delays except for the settings described in the previous section where both types coincide. Fig. 3 : The information loss from delays. We investigate the effective information metric (θ(C τ 1 )) for various delay distributions, which are negative binomial (NB) with dispersion k. Panel A examines how mean delay (δ) and k influence the information loss, showing that various combinations can result in the same loss (intersections of the coloured curves, each representing a differentδ, with the dashed black line). Further, the inset illustrates the variations in our metric at a given mean due to the shape of the delay distribution. Panel B confirms this relationship and indicates that the most dispersed distributions (smallest k, blue, with largest start to cumulative delay distribution F τ −t ) preserve the most information as compared to more deterministic delays (red, largest k). The insets additionally verify this point. Our metric (Eq. (15) ) not only allows the comparison of different under-reporting schemes and reporting delay protocols (see above section) but also provides a common score for assessing the reliability or informativeness . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint of diverse data streams for inferring R τ 1 . The best stream, from this information theoretic viewpoint, maximises the product of the geometric means G(.) of the cumulative delay probabilities F τ −t and reporting fractions ρ t . Many common surveillance data types used for inferring pathogen transmissibility have been modelled within the framework of Eq. (5) and therefore admit related θ(.) metrics. Examples include time series of deaths, hospitalisations, the prevalence of infections and incidence proxies generated from viral surveys of wastewater. We detail death count data in the following section but note that its model, provided in Eq. (6), is a simple extension of Eq. (5). Hospitalisations may be described similarly with the ifr term replaced by the proportion of infections hospitalised and the intrinsic delay distribution now defining the lag from infection to hospital admission [1] . The infection prevalence conforms to Eq. (5) because it can be represented as a convolution of incidence with a duration of infectiousness distribution, which essentially contributes a reporting delay [48] . Viral surveys also fit Eq. (5). They offer a downsampled proxy of incidence, which is delayed by a shedding load distribution defining the lag before a case is detected in wastewater [49] . Consequently, our metrics are widely applicable. While in this study we focus on developing methodology for estimating and contrasting the information from the above surveillance data we find that our metric is also important for defining the complexity of a noisy renewal epidemic model. Specifically, we re-derive Eq. (15) as a key term of its description length (L). Description length theory evaluates the complexity of a model from how succinctly it describes its data (e.g., in bits) [35, 50] . This measure accounts for model structure and data quality and admits the approximation Here the first term indicates model fit by assessing the log-likelihood at our MLEŝ R τ 1 . The second term includes data quality through the number of parameters (p) and data size (m). The final term defines how model structure shapes complexity with the integral across the parameter space of R τ 1 . This formulation was adapted for renewal model selection problems in [31] assuming perfect reporting. We extend this and show that our proposed total information T(C τ 1 ) plays a central role. Given some epidemic curve C τ 1 we can rewrite the previous integral as − p 2 log m + log τ t=1 F C (R t ) dR t and observe that m = p = τ . It is known that under a robust transform such as R t = 2 √ R t this integral is conserved [35, 36] . Consequently, 1 dR t with R max as some maximum value that every R t can take. Combining these expressions we obtain Eq. (16), highlighting the importance of our total information metric. If we have two potential data sources for inferring R τ 1 then we should select the one with the smaller L C value. Since the middle term in Eq. (16) remains unchanged in this comparison, the key points when comparing model complexity relate to the level of fit to the data and the total Fisher information of the model given that data [50] . Using ∆ to indicate differences this comparison may be formulated as ∆L C ≈ −∆ (R τ 1 ) + ∆ log T(C τ 1 ). The second term can be rewritten as ∆ log η(C τ 1 ) (see Eq. (14)). This signifies that our metrics play a central part when comparing different data streams. Are COVID-19 deaths or cases more informative? In the above sections we developed a framework for comparing the information within diverse but noisy data streams. We now apply these results to better understand the relative reliabilities of two popular sources of information about transmissibility R τ 1 ; the time series of new cases C τ 1 and of new death counts D τ 1 . Both data streams have been extensively used across the ongoing COVID-19 pandemic to better characterise pathogen spread [1] . Known issues stemming from fluctuations in the ascertainment of COVID-19 cases [18, 21] have motivated some studies to assert D τ 1 as the more informative and hence trustworthy data for estimating R τ 1 [2, 11] . These works have reasonably assumed that deaths are more likely to be reliably ascertained. Case reporting can be substantially biased by testing policy inconsistencies and behavioural changes (e.g., symptom based healthcare seeking). In contrast, given their severity, deaths should be less likely to be under-ascertained [1] . However, no analysis, as far as we are aware, has explicitly tested this assumption. Here we make some progress towards better understanding the relative merits of both data streams. We start by computing ratios of our metric in Eq. (15) for both C τ 1 and D τ 1 via Eq. (5) and Eq. (6). This gives Eq. (17) where indicates greater than or equals in terms of total information with the infection to death cumulative distribution H τ −t := τ −t x=0 γ x . Eq. (17) states that case data is more informative if the geometric mean of the case to death reporting fractions is at least as large as that of the death and case delays. Studies preferring death data effectively claim that the variation in case reporting probabilities ρ t (which we proved in a previous section always decreases the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint geometric mean of sampled incidence at a given mean probability) is sufficiently strong to mask the influences of the infection fatality ratio (ifr t ), the death reporting probability (σ t ) and any variations in those quantities. Proponents of using death data to infer R τ 1 recognise that the infection to death delay (cumulative distribution H τ −t ) is appreciably larger in mean than that of corresponding reporting lags from infection (F τ −t ) and therefore unsuitable for real time estimation (where this extra lag denatures recent information as we showed in above sections). We allow for all of these adjustments. We assume that the infection fatality ratio is constant at ifr (maximising G(ifr t )) and that death ascertainment is perfect (σ t = 1). Even for purely retrospective estimation with correction for delays we expect G Hτ−t Fτ−t ≤ 1. We allow this to be 1 maximising the informativeness of D τ 1 . Combining these assumptions we reduce Eq. (17) into Eq. (18) . This presents a sufficient condition for case data to be more reliable than the death time series. Here we choose an ifr for COVID-19 of 1% [2] . Estimates of the mean sample fraction of cases range from about 7% to 38% [18] , forming a constraint onρ, the average ρ t . Using these figures we examine possible ρ t sampling distributions under the Beta(a, b) formulation from earlier sections. Our main results are in Fig. 4 . We take 10 4 samples of ρ t from each of 2000 distributions. These are parametrised over 10 −1 ≤ b ≤ 10 2 with a set to satisfy our meanρ reporting constraints. Panel A plots our metric against those constraints (a different colour for eachρ) and the ifr threshold (black dashed). Whenever θ(C τ 1 ) ≥ ifr we find that case data is more reliable. This appears to occur for many possible combinations of ρ t . The inset charts the proportion of Beta distributions that cross that threshold. This varies from about 45% atρ = 0.07 to 90% atρ = 0.38. While these figures will differ depending on how likely a given level of variability is, they offer robust evidence that death counts are not necessarily more reliable. Even when deaths are perfectly ascertained (σ t = 1) the small ifr term in D τ 1 means that 99% of the original incidence data is lost, contributing appreciable uncertainty. These points are reinforced by the numerous assumptions we have made, which inflate the information in the death time series (e.g., it is likely σ t < 1, ifr < 0.01 and neither is constant). Panel B displays the distributions of our sampling fractions with red (blue) indicating which shapes provide more (less) information than death data (see Eq. (18)). Note that these results hold for both real time and retrospective analyses as we ignored the noise Fig. 4 : Epidemic case records may often be more informative than death counts. Using metrics (θ(C τ 1 )) we compare the effective information in epidemic case curves C τ 1 and death counts D τ 1 under assumptions that lead to Eq. (18) . We examine various reporting strategies parametrised as Beta distributions with meansρ ranging from 0.07 to 0.38 [18] and compare the resulting θ(C τ 1 ) against the equivalent from deaths (which reduces to just the infection fatality ratio, ifr). Panel A indicates that many such distributions for sampled cases still contain more information than available from deaths (proportion of vertical lines above the black dashed threshold, plotted inset). Panel B plots those distributions at the ends of thē ρ range with red indicating when C τ 1 is more reliable. Substantial fluctuations in C τ 1 reporting can still preserve more information than might be found in D τ 1 . induced by the additional delays that death data contain (relative to case reports) when we maximised G Hτ−t Fτ−t . Consequently, death data cannot be assumed, without rigorous and context-specific examination, to be generally more epidemiologically meaningful. For example, while D τ 1 is unlikely to be more reliable in well-mixed populations, it may be in high-risk settings where the local ifr is notably larger (e.g., care homes). Vaccines, which substantially reduce ifr values in most contexts, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. will make death time series less informative about R τ 1 . However, pathogens such as Ebola virus, which induce large ifr parameters, will likely lead to death data that is more reliable than their case counts. Public health policymaking is becoming progressively data-driven. Key infectious disease parameters [6] such as effective reproduction numbers and growth rates, fitted to heterogeneous outbreak data sources (e.g., case, death and hospitalisation curves), are increasingly contributing to the evidence base for understanding pathogen spread, projecting epidemic burden and designing effective interventions [4, 6, 51] . However, the validity and value of such parameters depends substantially on the quality of surveillance data [1, 7] . Although many studies have made important advances in underscoring and correcting errors in these data [13, 42] no work (to our knowledge) has yet attempted to directly quantify their quality. Here we have made some progress towards this aim. We applied Fisher information and experimental design principles to derive a novel framework for quantifying the information within common outbreak data when inferring pathogen transmissibility. Our approach involved finding the total information, T(.), available from epidemic curves corrupted by reporting delays and underreporting. These are predominant noise sources that limit surveillance quality. By maximising T(.) we minimise the overall uncertainty of our transmissibility estimates, hence measuring the reliability of that data. This approach yielded a new non-dimensional metric, θ(.), that allows analytic and generalisable insights into how noisy surveillance data degrades estimate precision. Using this metric we characterised the impact of different types of delay and under-reporting schemes. We demonstrated that under mean surveillance constraints, constant under-reporting of cases minimises loss of information. However, constant delays in reporting maximise this loss. The first result bolsters conventional thinking [9] , while the second highlights the need for timely data [47] . Importantly, our metric provided insight into the nuances of noise, elucidating how the mean and variability of schemes both matter. For example, fluctuating reporting protocols with larger mean may outperform stable ones at lower mean. Exploiting this and the flexibility of our framework, which can describe the noise in cases, death counts, hospitalisations, infection prevalence and wastewater virus surveys, we demonstrated how diverse data sources could be ranked. Specifically, we critiqued a common assertion about death and case data. Because the reporting of cases can vary significantly when tracking acute diseases such as COVID-19, various studies have assumed death data to be more reliable [1] . Using our metric, we presented one of the first qualifications of this claim. We found that the infection fatality ratio acts as an under-reporting factor with very small mean. Only the most severely varying case reporting protocols result in larger information loss, suggesting that in many instances this assertion may not hold. Note that this analysis does not even consider the additional advantages that case data bring in terms of timeliness. However, there may be other important reasons for preferring death data (e.g., when little is known about the reporting protocol). As hospitalisation counts effectively interpolate among the types of noise in cases and deaths, this might serve as the best a priori choice of data for inferring transmissibility. Some studies also propose to circumvent these ranking issues by concurrently analysing multiple data sources [43, 51] . This then opens questions about how each data stream should be weighed in the resulting estimates. Our framework may also help by quantifying the most informative parts of contributing streams. A common way of deriving consensus weighs individual estimates by their inverse variance [52] . As the Fisher information defines the best possible inverse variance of estimates, our metrics naturally apply. While our framework can enhance understanding and quantification of surveillance noise, it has several limitations. First, it depends on renewal model descriptions of epidemics [26] . These models assume homogeneous mixing and that the generation time distribution of the disease is known. While the inclusion of more realistic network-based mixing may not improve transmissibility estimates [53] (and this extra complexity may occlude insights), the generation time assumption is a true constraint that may only be ameliorated through the provision of updated, high quality line-list data [54] . Further, our analysis is contingent on having knowledge of the delays, under-ascertainment rates and other noise sources within data. These may be unavailable or themselves highly unreliable. To include this additional uncertainty is an important next step for this work, which will likely involve recomputing our metrics using posterior Fisher information terms [34, 55] that allow prior distributions on the noise parameters. We also assumed that the time scale chosen ensures that R t parameters are independent. This may be invalid. In such instances we can append non-diagonal terms to Fisher information matrices or use our metric as an upper bound. Last, we defined the reliability or informativeness of a data stream in terms of minimising the joint uncertainty of the entire sequence of reproduction numbers R τ 1 . This is known as a D-optimal design [38] . However, we may instead want to minimise the worst uncertainty among . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint the R τ 1 (which may better compensate known asymmetries in inferring transmissibility [56] ). Our framework can be reconfigured to tackle such problems by appealing to other design laws. We can solve this specific problem by deriving an E-optimal design, which maximises the smallest eigenvalue of our Fisher information matrix. All data and code (Matlab v2021a) necessary for reproducing the analyses and figures in this manuscript, as well as for applying the methods we have developed are freely available at https://github.com/kpzoo/informationin-epidemic-curves. For some infectious outbreaks the case (Eq. (5)) and death (Eq. (6)) count time series might be overdispersed [11] i.e., the mean-variance equality inherently assumed by the Poisson renewal model may not be valid. This can result when transmission is strongly influenced by heterogeneities in contacts and infectiousness or when incidence data are corrupted by additional intrinsic noise. These effects are often modelled by generalising Eq. (1) to a negative binomial (NB) form [2, 57] , with dispersion k and success probability p. This leads to the expression below, with mean Λ t R t and second argument as p. As we focus on the impact of the heterogeneity here, we assume perfect reporting (reporting noise as in the main text only affects the mean of this model). Taking derivatives of the log-likelihood corresponding to the NB model, This gives the MLER t = I t Λ −1 t , which is the same as for Eq. (1). Computing E[− ∂ 2 (Rt) ] yields the Fisher information (FI) that I t contains about R t below. The first term above is the FI from Eq. (3). Heterogeneity therefore subtracts from its FI with a dispersion controlled term. As k → ∞ this disappears and NB → Pois. While we do not explicitly include heterogeneity in the analyses of the main text, we do show, importantly, that our results do not qualitatively change if overdispersion . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint is included. There we rank epidemic and death curves, which have been corrupted by under-reporting and delays, according to their Poisson FI. Curves with larger FI are deemed more reliable sources of information about R t . If the delay and under-reporting noise do not alter k (i.e., the level of transmission heterogeneity is stable), then this Poisson FI ordering remains valid. We confirm this in Fig. 5 , demonstrating a monotonic relationship between the FI from both models at fixed k. Our main results are therefore moderately robust to heterogeneities. Overdispersion maintains Fisher information based rankings. We plot the FI under a NB observation model that models overdispersion or heterogeneities in transmission, against the FI from the corresponding Poisson renewal model (with the same mean) at R t = 1. For various dispersion parameters, k (smaller k means more heterogeneity), we find a monotonic ordering between these two FI values. This suggests that the rankings of data sources derived from their Poisson FI will likely also remain preserved under more complex NB models, provided those data sources have similar k values. Information if case source times are never reported Eq. (9) allows us to compute the FI of under-reported and delayed surveillance data. However, it assumes that we eventually uncover the true time of infection of the delayed cases. This is only valid for OBNR data [13] , for which we can separate C t from Eq. (5) into components t x=1 Pois(δ t−x ρ x Λ x R x ) for every t. Here we compute the FI for the more complex case of NEVR delays, in which the source data of delayed cases are never reported and hence this sum cannot be decomposed. Specifically, we solve Eq. (10) for a three-dimensional example i.e., τ = 3 and prove that we obtain a triangular FI matrix that has diagonal terms as in Eq. (11) . We then compute the total information, T(C τ 1 ) for never reported delays (the NEVR analogue to Eq. (13)) showing that it involves a product of those diagonal terms. This allows us to assert Eq. (13) as an upper bound and to prove convergence of these when the effective reproduction numbers are constant i.e., R t = R for all t. The algorithms we apply serve for any τ by inductively repeating the procedure here (for code see https://github. com/kpzoo/information-in-epidemic-curves). We start from Eq. (10) with α (t−x)x := δ t−x ρ x Λ x to define the complete log-likelihood below. Taking derivatives for R 1 gives ∂ (R 3 1 ) This is repeated for all other R t and conforms to the description in the main text. When solved and rearranged this gives the MLEs in Eq. (12) . We then calculate E[− ∂ 2 (R τ 1 ) ∂RtRx ] for every combination of t and x with the expectation of any C t following as the mean in the Poisson model Eq. (5) . This generates the complete FI matrix, F C , below with β 1 = α 01 R 1 , β 2 = α 02 R 2 +α 11 R 1 and β 3 = α 03 R 3 +α 12 R 2 +α 21 R 1 . The total information that C 3 1 contains about R 3 1 is already computable from the determinant of this matrix. However, we can obtain a more illuminating form by applying elementary column operations. Such operations do not change the determinant. Using col(x) for column x we successively apply col(2) → col(2) − α12 α03 col(3), col(1) → col(1) − α21 α03 col(3) and col(1) → col(1) − α11 α02 col (2), revealing the triangular matrix below. This procedure of subtracting multiples of the later columns from earlier ones can be repeated for any τ to also extract analogous triangular forms. The total information T(C 3 1 ) therefore depends on the diagonals of this triangular matrix as shown above. When written out this corresponds to a product of the terms given in Eq. (11) for all t. Each term is smaller than or equal to the corresponding term from an OBNR . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint delay, given in Eq. (9) . We can therefore use the total information in Eq. (13) to upper bound that available from a time series with NEVR delays. Intriguingly, this bound is sharp in some important instances. Specifically, if transmissibility is constant so R t = R for all t then ∂ (R) ∂R = 3 t=1 C t R −1 − F 3−t ρ t Λ t and consequently F C (R) = 3 t=1 F 3−t ρ t Λ t R −1 . This is precisely the FI obtained under an OBNR delay (by summing Eq. (9)). The MLEs for both delay types are also equal. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.16.22275147 doi: medRxiv preprint Reproduction number (R) and growth rate (r) of the COVID-19 epidemic in the UK: methods of estimation, data sources, causes of heterogeneity, and use as a guide in policy formulation Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: a modelling study across 131 countries How modelling can enhance the analysis of imperfect epidemic data Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the western area region of Sierra Leone The R value and growth rate Resurgence of SARS-CoV-2: Detection by community viral surveillance Practical considerations for measuring the effective reproductive number, Rt Reporting errors in infectious disease outbreaks, with an application to pandemic influenza A/H1N1 Reconstructing influenza incidence by deconvolution of daily mortality time series Reduction in mobility and COVID-19 transmission Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures Quantitative Methods for Investigating Infectious Disease Outbreaks Adjustments for reporting delays and the prediction of occurred but not reported events Bayesian outbreak detection in the presence of reporting delays Unreported cases in the 2014-2016 Ebola epidemic: Spatiotemporal variation, and implications for estimating transmission Nine challenges in incorporating the dynamics of behaviour in infectious diseases models Underdetection of cases of COVID-19 in France threatens epidemic control A new framework and software to estimate time-varying reproduction numbers during epidemics An exact method for quantifying the reliability of end-of-epidemic declarations in real time The Impact of Changes in Diagnostic Testing Practices on Estimates of COVID-19 Transmission in the United States Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings Bayesian Nowcasting during the STEC O104:H4 Outbreak in Germany Transmission dynamics of the 2009 influenza (H1N1) pandemic in India: The impact of holiday-related school closure Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Estimating individual and household reproduction numbers in an emerging epidemic How generation intervals shape the relationship between growth rates and reproductive numbers Influenza transmission in households during the 1918 pandemic Ebola virus disease in West Africa -the first 9 months of the epidemic and forward projections Revealing the Microscale Signature of Endemic Zoonotic Disease Transmission in an African Urban Setting Adaptive estimation for epidemic renewal and phylogenetic skyline models Theory of Point Estimation Mutual information, Fisher information, and population coding Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions? The Minimum Description Length Principle Robust design for coalescent model inference The use of transformations Optimal experimental designs Counting probability distributions: Differential geometry and model selection Measuring the path toward malaria elimination On the estimation of the reproduction number based on misreported epidemic data Outbreak analytics: a developing data science for informing the response to emerging pathogens Inferring the effectiveness of government interventions against COVID-19 A proof of the Fisher information inequality via a data processing argument Elements of Information Theory Inequalities: Theory of Majorization and its Applications Can the COVID-19 Epidemic Be Controlled on the Basis of Daily Test Reports? On the use of aggregated human mobility data to estimate the reproduction number Wastewater-based estimation of the effective reproductive number of SARS-CoV-2 Fisher information and stochastic complexity Four key challenges in infectious disease modelling using data from multiple sources Statistical meta-analysis with applications Measurability of the epidemic reproduction number in data-driven contact networks Serial interval of SARS-CoV-2 was shortened over time by nonpharmaceutical interventions Posterior Cramer-Rao Bounds for Discrete-Time Nonlinear Filtering Fundamental limits on inferring epidemic resurgence in real time using effective reproduction numbers Superspreading and the effect of individual variation on disease emergence Thanks to Matthew Hickman for providing useful and interesting comments on the manuscript.