key: cord-0772490-eh08e8q1 authors: Srinivasa Rao, Arni S.R.; Krantz, Steven G. title: Ground Reality Versus Model-Based Computation of Basic Reproductive Numbers in Epidemics date: 2021-01-27 journal: J Math Anal Appl DOI: 10.1016/j.jmaa.2021.125004 sha: 9a47498cb44a8b67a974e436e26581b4a19af267 doc_id: 772490 cord_uid: eh08e8q1 Computation of basic reproductive numbers is one of the primary goals of epidemic modelers. There are several challenges in such computations, especially when the data from the virus transmission networks are not so easy to collect; this makes model validation almost impossible. We provide a technical comment on the precautions to be taken while computing model-based basic reproductive numbers so that the ground realities of such computation are maintained. Basic reproductive numbers need to be adjusted retrospectively to compensate for reporting errors within the epidemic spread networks. Such an adjustment would lead to revised pandemic preparedness and mitigation plans. The basic reproductive number (R 0 ) of an epidemic is often considered as a key parameter to understanding the spread of an epidemic. There has been a wide range of discussions and debates by the general scientific community over computation of R 0 for various affected populations due to COVID-19 (see for example [1, 2, 3] ). This parameter is normally computed from a mathematical model built for a specific population's epidemic spread, and the parameter value is often not verified at the population level [4] . If data on complete contact tracing of infected and susceptible individuals is available in a population at a given time point, say t 1 , and if such data is again collected at another time point, say t 2 for t 2 > t 1 > 0, then one can compute R 0 using such data sets and assign the value obtained to the time interval [t 1 , t 2 ]. When complete data is available to compute R 0 , we do not need to build a model for the same purpose, because a model can never be better than what is given by the true data. After all, the purpose of computing R 0 is so that it can assist in gleaning the groundlevel reality of an epidemic spread for the time period for which R 0 is computed [5, 6, 7, 8] . Visualization of true R 0 in a population is always associated implicitly with the definition of the true incidence of infection in the population within a given time interval. But the ground-level reality of R 0 need not comply with the model-based computation of R 0 . There have been several model-based studies that emphasize the need for revising the way we compute incidence rates through simple SIR models. For example, if we consider the incidence of infected, say, i(s) during [t i , t i+1 ] for i = 0, 1, 2, ..., n as also observed in [1] , we see the following: Here I is the number of infected individuals, and β is the contact rate among infected and susceptible individuals (S). The total population, say N , is equal to S + I + R as in a standard SIR model or as in [8] , where R is that portion of the population recovered from infection. The quantity i(s) in (1) is model-based incidence, but true incidence in the population is difficult to verify unless all the infected during [t i , t i+1 ] are diagnosed and all were reported within that time interval. We know that diagnosis of all those susceptible to infection cannot be complete for several well-known viruses, such as influenza, COVID-19, H1N1, H5N1, etc. A sizable number of those infected by these viruses may not even report their cases after recovery, as they may not know their infection status. In (2), I NR , the number of unreported cases (and including those not diagnosed) also contributes to the newer infections. The first term of the righthand side of (2) can be practically verified as it uses diagnosed and reported cases. The second term of the right hand side of (2) can never be verified unless the total number of infected cases in the population, as newly defined, say I, is known for I = I R + I NR . Newer infected cases, if diagnosed, will be added into I R , the reported cases. If not, they will be added into I NR for the next iteration. The contact rates among those reported (β R ) and among those not diagnosed (β NR ) need not be always the same, because those not diagnosed may not take the precaution of not having risky contacts. For example, those individuals who have not been tested for COVID-19 but have the corona virus might not take precautions similar to those individuals who have tested positive. In addition to the limitations discussed here, there exists a complex ground-level verification situation for R 0 that we discuss further in what follows. Even if every infected individual on the ground is reported within [t 0 , t n ], then the reporting may not be complete within a smaller subset of intervals of [t i , t i+1 ] for i = 0, 1, ..., n. Suppose that and the incidence of reported cases within the interval [t 0 , t n ] is expressed as a sum of the incidence of reported cases in the sub-intervals. Then the corresponding model-based estimates can be expressed aŝ Given (4), and with R 0 computed using the reported data within each of the sub-intervals in (5), we see that the value of R 0 computed would not be same as the true value for the corresponding sub-intervals of (5) after adjusting for the reported cases. Moreover, the SIR model-based incidence rates, if verified based on reported cases within each interval, need not match with the model-based estimates. The ground-level incidence rates due to all cases reported in a larger interval like [t 0 , t n ] need not represent the sum of the infected data of the sub-intervals. Redistribution of the reported cases retrospectively within each sub-interval was described in [9] . In [9, 10] , wavelets were used to obtain complete information from partial information on reported cases. Let I R (t 0 ) be the reported infected cases at time t 0 that are assumed to contribute incidence of new cases during (t 0 , t 1 ] among S(t 1 − t 0 ), the number of susceptible cases. Let I R (t 1 − t 0 ) be the reported incidence during (t 0 , t 1 ]. The average number of newly infected reported cases during (t 0 , t 1 ], per infected at t 0 , is The effective number of infected at time t 1 , by assuming no deaths of infected . The average number of newly infected reported cases during (t 1 , t 2 ] per infected at t 1 is Let φ(t 2 − t 1 ) be the reported cases of incidence during (t 2 − t 1 ) that have originally acquired infection during (t 0 , t 1 ] but were reported within I R (t 2 − t 1 ). There are many reasons for improper reporting periods and delayed reporting cases. Due to improper reporting, the adjusted values of R(t 1 −t 0 ) and R(t 2 −t 1 ), are expressed as and Numerators expressed in the right hand side of (8) and (9) will change if there is evidence of reported cases after time t 2 that belongs to the periods (t 0 , t 1 ] and (t 1 , t 2 ]. The same arguments will be continued for all the sub-intervals of (5). Similarly, the I R (t 0 ) value might need re-calculation if the reported cases at t 0 were not complete. Similar kinds of adjustments in computing the average number of newly infected generated per infected can be introduced if there are multiple reporting patterns of single infected cases existing in the data. This matter is beyond the scope of a short comment. Suppose there are 570 individuals who are reported to have a certain virus at time t 0 in a population of 150, 000. Let there be 550 new infections reported during (t 0 , t 1 ] caused by one or more of the 570 infected at time t 0 . The average number of new infections caused by the 570 during (t 0 , t 1 ] is 550 570 < 1. Assuming no deaths of the infected during the interval [t 0 , t 1 ], all the infected individuals of 1120 at t 1 are active to infect more susceptible people during (t 1 , t 2 ]. Suppose that the number of infected cases reported during the time interval (t 1 , t 2 ] is 1500. Then the average number of new infected cases during (t 1 , t 2 ] is 1500 1120 > 1. Suppose that 95 individuals out of the 1500 who were reported during (t 1 , t 2 ] actually acquired the disease during the time interval (t 0 , t 1 ]. Then the average number of new infections re-calculated for the prior time interval (t 0 , t 1 ] that were caused by 570 infected will be 550+95 570 = 645 570 > 1. Through this example, we demonstrate the precautions of usage of R 0 as the gold-standard and how we could adjust the data in foreseeing the ground reality. Suppose the size of the total population N is dynamic and N 1 = S + I + R for N 1 ⊂ N. In this situation, we need to assume a flow of population (at a rate, say λ) from N to N 1 . This flow could be only to the compartment S or to other compartments of N 1 with some rates of motion (should there be infected and recovered people in N c 1 , where N c 1 = N − N 1 ). There could be two-way flow between N and N 1 and such a flow could be age-dependent. The SIR model could also have terms with death rates applicable at each compartment. All these modifications and many more such varieties of alterations of modeling would produce a different underlying model-based R 0 . However, our arguments provide for cross validating the ground level basic reproductive numbers from the reported data. The adjustments required for an extended model generated by the dynamic N would still be applicable and new forms of (8) and (9) would apply. Two complementary model-based methods for calculating the risk of international spreading of a novel virus from the outbreak epicentre. The case of COVID-19 Predictive Mathematical Models of the COVID-19 Pandemic: Underlying Principles and Value of Projections How Relevant is the Basic Reproductive Number Computed During COVID-19, Especially During Lockdowns? The failure of R0 Complexity of the Basic Reproduction Number (R0) Helminth Dynamics: Mean Number of Worms, Reproductive Rates, Handbook of Statistics A Brief History of R 0 and a Recipe for its Calculation A note on observation processes in epidemic models True epidemic growth construction through harmonic analysis Level of underreporting including underdiagnosis before the first peak of COVID-19 in various countries: Preliminary retrospective results based on wavelets and deterministic modeling The comments by two referees helped revise our original draft thoroughly. We are very much thankful to them.