key: cord-0256972-hvni365h authors: Caudana, B. title: Mathematical Relationship between Effective Reproduction Number Rt and Epidemic Curve of Daily Cases -- Demonstration and Details date: 2021-01-25 journal: nan DOI: 10.1101/2021.01.24.21250405 sha: 671145ba5d86df1452fd36bd94d6d0e875b95866 doc_id: 256972 cord_uid: hvni365h The strict mathematical relationship between Rt and the curve of daily cases f(t) is shown. Up-to-date and statistically robust Rt from the curve of daily cases can be estimated as soon as new cases are added to the curve. That is equivalent to estimating Rt by averaging all detected cases of infection, without any distortion induced by the difficulty of following and weighting trees of secondary cases from original ones, and without needing to wait for secondary cases to manifest infection. With this method, if Rt scaled numbers are of interest, only the average duration of infectivity of subjects has to be estimated directly, but independently of linking secondary cases to primary ones. A new index, instantaneous reproduction number Rist is introduced, which does not depend on the duration of infectivity of subjects. Rist, Rt and the doubling/halving time of the epidemics may be estimated by simple computations at the very detection time of new daily cases. Any smoothed curve of daily cases gives smooth Rt and Rist. No phase lag on Rt estimate is introduced by this method. The strict mathematical relationship between R t and the curve of daily cases f (t) is shown. Up-to-date and statistically robust R t from the curve of daily cases can be estimated as soon as new cases are added to the curve. That is equivalent to estimating R t by averaging all detected cases of infection, without any distortion induced by the difficulty of following and weighting trees of secondary cases from original ones, and without needing to wait for secondary cases to manifest infection. With this method, if R t scaled numbers are of interest, only the average duration of infectivity of subjects has to be estimated directly, but independently of linking secondary cases to primary ones. A new index, instantaneous reproduction number R ist is introduced, which does not depend on the duration of infectivity of subjects. R ist , R t and the doubling/halving time of the epidemics may be estimated by simple computations at the very detection time of new daily cases. Any smoothed curve of daily cases gives smooth R t and R ist . No phase lag on R t estimate is introduced by this method. KEYWORDS: epidemics, reproduction number, daily cases, cumulative counts, epidemic curve, mathematics During the first phase of COVID19 epidemics I encountered estimations of R t which where incompatible with the doubling time of daily cases and the location in time of the peaks. So, I began to think on the subject. It seems that R t was defined from the epidemiological point of view with the assumption in mind that an epidemics can be characterized by a somewhat stable relationship between a pathogen and its infectable host. This in the hope of predicting the evolution of an outbreak. Which is not. In fact, the initial susceptibility of a population of hosts is always unknown because unknown is the reaction of the immune system spectrum and history of a population. Besides that, both pathogen and host can modify this relationship via several options (decreasing susceptibility of the host population due to the spreading of the epidemics that saturates a population or sub-population of susceptible individuals, reaction of immune systems, reactive behaviors of the host and the pathogen populations, etc). This writing shows how R t definition is strictly tied to the curve of daily cases by mathematical equations. The two are essentially the same thing expressed with different words. R t is a sort of first derivative of the curve of daily cases with respect to time t. The difficulty of directly estimating R t in a reliable way is the same as predicting the evolution of an epidemics in a reliable way. Indeed even much harder, since one has to face the further uncertainty of estimating trees of secondary cases, with all the uncertainty implied by this process. It is very similar to estimating the space traveled by measuring acceleration with very inaccurate accelerometers, but very much harder and error prone. The excellent articles by Cori, et al. [2] and Dietz [3] clearly show this difficulty. The epidemiological definition of R t states: R t is the number of secondary infections caused by a single case of disease during its period of infectivity in a completely susceptible population, on average. According to this epidemiological definition, R t is analogous to the multiplier of the initial unit capital after 1 period, in a compound capitalization process. This analogy allows the estimation of R t from the epidemic curve of daily cases by introducing the concept of Instantaneous Reproduction Number R ist , similar to the instantaneous capitalization rate in actuarial mathematics. The epidemiological definition of R t (and its cousin R 0 , as its limit to the beginning of an epidemics of an uninfected population) in-dicates an exponential expansion. An infected, after his period of infectious capacity will have infected a new infected plus (or minus) a number of new infected individuals. Let's say, for example, one infected plus another one and a half infected, equal to two and a half infected (1 + 1.5 = 2.5). After 2 periods of infectivity, the infected will be those of the previous period (2.5) each of which will have infected new ones (1 + 1.5 = 2.5): i.e. the (1 + 1.5) of period 1, multiplied by (1 + 1.5) of period 2; and so on... • after period 1: 1 · (1 + r) = R t ; • after period 2: 1 · (1 + r) · (1 + r) = 1 · (1 + r) 2 ; • and so on: In fact, this is a process equivalent to the amount of a compound capitalization of the interest rate r, where R t is the amount after period 1. To obtain which interest rate r should be used for a continuous compound capitalization of n fractions of a period that gives the amount R t after 1 period, we can write as follows: Passing to the limit for n → ∞, and noting that lim n→∞ (1 + r n ) n r = e, we get: In other terms, r is the exponent to be given to e to obtain R t after a period of infectious duration equal to 1. That is: If we want to express R ist in a unit of time g i other than the dimensionless unit period, for example the days (or hours) with which we measure the duration of the infectivity period of an infectious subject and with which we measure the progress of the epidemic, we can write: from wich: In this way we have the parameter R ist which characterizes the exponential growth (as per the definition of R t ) at the point in time t that the increase (or decrease) of the daily cases generates. Connecting R ist to the epidemic curve of daily cases Whenever an exponential function y = e ax is represented in logarithmic scale ln(y) = ax, it becomes a straight line. Its shape factor a becomes the slope of the straight line (the angular coefficient). If we represent the curve of the daily cases f (t) in logarithmic scale h(t) = ln(f (t)), the slope of the tangent of h(t) at point t is the slope R ist , corresponding to the exponential growth of the epidemiological definition of the effective reproduction number R t , represented in logarithmic scale, at time t, and scaled in time units of the curve of daily cases. But the tangent of h(t) at point t is also the first derivative of h(t) that is: A different reasoning perhaps better illustrates the concept of estimating R t from epidemic curves. R t is basically the ratio between the daily cases at time t + 1 compared to the cases at time t, where 1 is the infecting period. Given the point a on the curve of daily cases that precedes the point b, This ratio represents the rate of increase (if > 1), or decrease (if < 1), of the infections averaged over all the infections observed, including all the information on the overall average resistance to the spread of the infection that may have formed meanwhile, for any known or unknown reason it was formed. It also takes in properly weighted account all the overlappings of the infection trees defined by R t and of the varying susceptibility of the hosts. Furthermore, the value obtained in this way is a very accurate value of R t acting at current time of b, that is, at the very moment in which the current value of the infected cases is known. The passage to the limit of a period that tends to the instant, implicit in the differentiation operation with respect to t, allows to have a curve of R t trend that is always updated in real-time. According to the epidemiological definition of R t , we have the following correspondence of classical outstanding cases, direct consequence of that epidemiological definition: when the daily cases increase and the epidemic is expanding: therefore the associated e R ist ·g i = R t > 1. R ist = 0 when the daily cases remain constant and the epidemic is stationary: therefore the associated e R ist ·g i = R t = 1. In this case the curve of daily cases has a minimum or a maximum; R ist crosses 0; R t crosses 1. R ist < 0 when the daily cases decrease and the epidemic is contracting: therefore the associated e R ist ·g i = R t < 1. Since these outstanding cases derive from the epidemiological definition of R t , they also are criterion for evaluating the correct estimate of R t . A contrasting value of R t respect to the epidemic curve is also an indication that R t or the epidemic curve are wrong. The curve of daily cases f (t) expressed in logarithmic scale with base e is obviously given by: R ist is given by the first derivative (numerically or analytically determined) of any smoothed curve of daily cases, given in logarithmic scale with base e: Please notice that if we have a smoothing procedure of the curve of daily cases that introduces any phase lag, as we have using mobile averages or FIR/IIR filters, we will have the same phase lag in the estimation of R ist and R t . Otherwise if we have some form of static averaging, as using some least squares fitting procedure, no phase lag is introduced. R t is given by: R ist is also equivalen to: The doubling or halving time of infection g d∨h is given by imposing 2.0 as R t and computing the number of resulting days (negative numbers represent halving time): The following charts show how R t may be estimated starting from a fitting of the curve of cumulative cases, with a sort of derivative of order 2. The curve of daily cases obviously is the first derivative of cumulative cases. The fitting is primarily done on cumulative cases because they automatically compensate some kind of errors (for example: a missed case one day may be detected in the following days, etc.). Model and fitting techniques used for the following figures are outside the scope of this writing. Here the model is simply used as source of a smoothed daily data set. The other formulas used to generate the following charts are summarized in the section above. The datasource used for this fitting is the COVID-19 official one for Italy: https://github.com/pcm-dpc/COVID-19/ [1] Just a glance at the dispersion of a ample set of daily data around a good fitting of these data let easily imagine how difficult and unreliable could be any attempt to estimate a trend of the epidemics from small samples of their derivatives and relying on considerations of the spread of these samples over overlapping trees of secondary cases, which is what the epidemiological definition of R t asks to do. Moreover, the dynamics of an epidemic seems to follow unpredictable and chaotic behavior. We are used to think of populations involved in an epidemic as an isotropic material, like steel, which has equal behavior in all directions respect to stress and strain. Perhaps, an epidemics may be better depicted as acting on many different relationship fabrics entangled together. A burst of infections occurs when two or more entangled fabrics -which may be in a stable infective condition that eventually saturate -mix and new connections merge in a new more extended fabric. If this is a plausible landscape of an infection of a population, not every link in this entaglement of networks has the same infection capacity and not all nodes of these networks are isotropically connected. In other words there may be several networks that may have poor connections with each other, while having strong connection among the members of each network. For example, the network of families with children that go to the same school may have strong link between families of teachers and classmates, but may have weak connections with other unrelated networks of parentschildren-teachers. Some of these networks may saturate eventually, while others may not have even been infected. The same thing happens with other types of relational networks. This is a very anisotropic environment. This landscape shows a very challenging non linear object to investigate. Maybe it has some emerging regularities at the macroscopic level, like sequences of overlapping sigmoidal shapes in the curve of cumulative cases. Dati COVID-19 Italia A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics The estimation of the basic reproduction number for infectious diseases Modello a diffusione-saturazione per andamento COVID-19 Basic reproduction number R_t R_t --Reproduction number assuming 13 days of infectivity on average Confirmed cases--999-Italy (P rif = 60M, i c = 0.0594 since