key: cord-0112347-mn2zhhtz authors: Shaw, Clara L.; Kennedy, David A. title: What the reproductive number R_0 can and cannot tell us about COVID-19 dynamics date: 2020-06-25 journal: nan DOI: nan sha: 996c6f1e9dd2ade0deab1903582748d7342148d5 doc_id: 112347 cord_uid: mn2zhhtz The reproductive number R_0 (and its value after initial disease emergence R) has long been used to predict the likelihood of pathogen invasion, to gauge the potential severity of an epidemic, and to set policy around interventions. However, often ignored complexities have generated confusion around use of the metric. This is particularly apparent with the emergent pandemic virus SARS-CoV-2, the causative agent of COVID-19. We address some of these misconceptions, namely, how R changes over time, varies over space, and relates to epidemic size by referencing the mathematical definition of R and examples from the current pandemic. We hope that a better appreciation of the uses, nuances, and limitations of R facilitates a better understanding of epidemic spread, epidemic severity, and the effects of interventions in the context of SARS-CoV-2. With the emergence of SARS-CoV-2, the novel coronavirus responsible for COVID-19, much attention has been given to the reproductive number, ℛ, and its initial state, ℛ 0 (Viceconte and Petrosillo, 2020) . ℛ 0 is the expected number of infections generated by an infected individual in an otherwise fully susceptible population (Anderson and May, 1991; Diekmann et al., 1990) . Under relatively general assumptions, ℛ 0 can be used to determine the probability an emerging disease will cause an epidemic, the final size of an epidemic, and what level of vaccination would be required to achieve herd immunity (Anderson and May, 1991; Delamater et al., 2019; Heffernan et al., 2005; Roberts, 2007) . Therefore, when interpreted correctly, and in conjunction with additional relevant information, it can yield valuable insight. However, misinterpretation may lead to faulty conclusions regarding disease dynamics. The virus, SARS-CoV-2 emerged in Wuhan, China in late 2019 and has since become pandemic causing over 413,000 deaths worldwide by June 10 th , 2020 ("Johns Hopkins University & Medicine Coronavirus Resource Center," 2020) in addition to severe economic distress. Policy makers have relied 3 on estimates of ℛ 0 to tailor control measures (e.g. Ferguson et al., 2020) , but these estimates vary tremendously within and between populations around the globe (Figure 1 ). It is important to understand why these estimates vary. It is also important to understand how the utility of ℛ is limited. Here, we derive and explain some of the key nuances of ℛ and ℛ 0 , paying particular attention to insights and limitations with respect to the emerging pathogen SARS-CoV-2. Figure 1 . Estimates of the ℛ 0 of SARS-CoV-2 vary substantially between locations affected by the pandemic Choi and Ki, 2020; Deb and Majumdar, 2020; Giordano et al., 2020; Johndrow et al., 2020; Korolev, 2020; Lewnard et al., 2020; Majumder and Mandl, 2020; Peirlinck et al., 2020; Pitzer et al., 2020; Ranjan, 2020; Read et al., 2020; Riou and Althaus, 2020; Sanche et al., 2020; Senapati et al., 2020; Shim et al., 2020; Singh and Adhikari, 2020; Tang et al., 2020; Wu et al., 2020; Yuan et al., 2020; Zhao et al., 2020) . Each point represents the average of a compilation of ℛ 0 estimates from different studies (sample size noted alongside means). Median estimates were used for any studies that provided more than one estimate. Error bars show plus or minus 1 standard error. Defining ℛ mathematically A general definition How many new infections will be caused by a single infected individual? For a directly transmitted pathogen, the answer to this question can be written as: Above, ℛ is the reproductive number, is the rate of contacts that an infected individual has with susceptible individuals at time τ post infection, is the probability that a contact at time τ results in a new infection, and is the probability of still being infected at time τ. Notably, we could have combined and into a single parameter since the probability of infection given contact falls to zero after an individual recovers, but we prefer this more explicit formulation. Equation (1) yields ℛ, the total number of infections one infected individual would generate over the course of their infection. When the population is fully susceptible, as would be expected at the beginning of an outbreak, equation (1) yields ℛ 0 . Note that we have neglected to explicitly incorporate individual variation and temporal variation in contact rates, the probability that a contact results in a new infection, and the time to recovery. However, ℛ and ℛ 0 , are intended to be averages, and so this variation is inherently a part of the reproductive number calculation. This variation could be explicitly included with additional subscripts to denote all possible infected hosts, noninfected hosts, times since the epidemic began, and ages of infections. For simplicity, hereafter, we will not use any subscripts for , , and . Calculating ℛ 0 using an epidemiological model Estimating the individual parameters in equation (1) requires extensive data collection for a specific pathogen, host population, and time. Alternative approaches to estimating ℛ 0 therefore frequently rely on epidemiological models and epidemic data. Following the lead of Kermack and McKendrick (1927) , epidemics have often been modeled as a set of ordinary differential equations. In the simplest Susceptible-Infectious-Recovered (SIR) model, the state variables , , and are the densities of susceptible, infectious, and recovered individuals in a population. Note that " " here is distinct from the reproductive number "ℛ", but we use both for historical reasons. Here, is the transmission coefficient of the pathogen and is the rate of recovery of infected individuals. Epidemiological data can be used to infer the values of model parameters, including for stochastic or more complex model formulations (for example, Kennedy et al., 2018) . To derive ℛ 0 from these equations, we put the SIR parameters in the context of equation (1). The per infected individual transmission rate, , is equivalent to the contact rate an infected individual has with susceptible individuals, , multiplied by the probability of a new infection resulting from a contact, . Above, the rate of recovery is constant over time, meaning that the time individuals remain infected is exponentially distributed in this SIR model. The probability of remaining infected, , is thus equal to − , where is the time since infection. Therefore, Since changes slowly at the beginning of an epidemic when is small, we can treat it as a constant with respect to . This assumption allows us to analytically solve the integral, which yields However, most biological systems are unlikely to conform to the assumptions of this simple SIR model. The model can be modified to better reflect the biology of the system being modeled (for examples, see Keeling and Rohani, 2007) but added complexity can make ℛ 0 more difficult or impossible to solve analytically. In these cases, ℛ 0 can be calculated as the dominant eigenvalue of the next generation matrix (Diekmann et al., 1990) or by other mathematical methods (Heffernan et al., 2005) . Calculating ℛ 0 without an epidemiological model ℛ 0 can also be estimated without an epidemiological model, which can be especially useful if parameter estimates or even an appropriate model structure are not yet known. In principle, one could calculate ℛ 0 by simply counting the cases attributed to infected individuals at or near the beginning of an outbreak. In practice, this method is rarely employed since contact tracing networks are rarely established during the earliest phase of an emerging disease outbreak (but see Pung et al., 2020) and estimates could be inaccurate due to bias towards observing large chains of transmission. ℛ 0 can also be inferred from the growth rate of cases early in an outbreak. Since the number of susceptible individuals changes slowly during the initial stages of an outbreak, early case growth rates can be approximated by exponential growth: the number of cases = 0 , where is the epidemic growth rate. If the number of cases is known for at least two time points, one could calculate the . The relationship between ℛ 0 and depends on the distribution of the generation interval , which is defined as the amount of time between infection of two individuals where the second infection is caused by the first. can be approximated by direct observation or specified by an epidemiological model (Wallinga and Lipsitch 2007) . For the model presented in eqs. (2.1)-(2.3) the generation interval is exponentially distributed, and therefore ℛ 0 = 1 + (Wallinga and Lipsitch 2007) . No matter the method used to calculate it, limited data or unreliable data early in an epidemic can make it difficult to constrain ℛ 0 . The World Health Organization originally estimated the ℛ 0 of SARS-CoV-2 to be between 1.4 and 2.5 (WHO, 2020). More recent estimates of ℛ 0 have varied from 2.2 to 6.47 for the beginning of the Wuhan outbreak ( Figure 1 ). This represents tremendous uncertainty when attempting to use ℛ 0 for public health planning. For example, if we were using these estimates to design a vaccine campaign capable of achieving herd immunity, our vaccination target (calculated as 1 − 1/ℛ 0 under assumptions of eq. 2.1-2.3) would be 29% of the population at ℛ 0 =1.4 or 85% at ℛ 0 =6.47. Even with better estimates of ℛ 0 , however, misconceptions around this metric lessen its practical utility. Misconception 1: ℛ 0 explains future dynamics As we have explained, ℛ 0 is calculated during the early stages of an epidemic because of its value in determining future infection dynamics (Anderson and May, 1991; Ma and Earn, 2006) . But ℛ changes over time in two important ways that limit its value in understanding future dynamics: first as awareness of infection leads hosts to alter their behavior, and second, as outbreaks progress and new hosts become limiting. Shifts in behavior that influence contact rates or the probability of infection given contact can alter the reproductive number ℛ over extremely short timescales. For example, as awareness of the SARS-CoV-2 epidemic grew in the United States in March 2020, human mobility ground to a near halt Warren and Skillman, 2020) , presumably reducing contact rates . Other individual behavioral changes such as increased handwashing and mask wearing (Belot et al., 2020; Goldberg et al., 2020) have likely reduced the probability of transmission given contact . Such bottom-up forces combined with top-down government-imposed interventions (e.g. school closures, banned gatherings) reduced ℛ to below 1 (the threshold for epidemic persistence) by late April in some states (Johndrow et al., 2020; Miller et al., 2020) . Similar reductions to ℛ were documented in China (R. 8 Tian et al., 2020) and other countries (Ensser et al., 2020; Giordano et al., 2020; . Indeed, in models of the 1918 influenza pandemic, incorporating a behavioral response to death rates improved model fits (Bootsma and Ferguson, 2007; He et al., 2013) . For SARS-CoV-2, behavioral changes may cause ℛ to fluctuate above and below 1 at different times based on the perceived threat of COVID-19. If that is the case, behavioral fluctuations and intermittent lockdowns may prevent hospital capacity from becoming overwhelmed (Tuite et al., 2020) . While behavioral changes can temporarily reduce ℛ as described above, more sustainable reductions in ℛ are typically achieved when susceptible individuals are removed from populations either through naturally acquired immunity or vaccination. However, the impact of removing susceptible individuals is often more complicated than under assumptions of classical SIR models such as eq. 2.1-2.3. When transmission rates are heterogeneous within a population, meaning that some individuals are more likely to contract infection than others, ℛ declines faster than predicted by eq. 2.1-2.3 (May and Anderson, 1987) . This is because those most susceptible (for example, due to high exposure or lower inherent immunity) will become infected earlier in an epidemic, leaving a susceptible population that is on average more resistant (Gomes et al., 2020; Langwig et al., 2017; May and Anderson, 1987) . This fact inherently limits the utility of the classical formulation of the "herd immunity" threshold, 1 − 1/ℛ 0 . Heterogeneity could be included in models to calculate a more realistic herd immunity threshold, but for SARS-CoV-2, data describing heterogeneity in infection risk are still highly uncertain and likely to continue changing through time due to individual or government-mandated responses (Dolbeault and Turinici, 2020) . Indeed, current estimates of the fraction of people infected with SARS-CoV-2, in the hardest hit communities (e.g. 15.5% in a small German town exposed to a super spreading event (Streeck et al., 2020) and 19.9% in New York City ("Information on novel coronavirus," 2020)) may be approaching thresholds required for herd immunity calculated under assumptions of extreme 9 heterogeneity conditions (Gomes et al., 2020) , but if heterogeneity in transmission is not extreme or if it decreases in the future, herd immunity thresholds will also shift higher. Misconception 2: The reproductive number is constant over space The ℛ 0 of many pathogens are often referred to as known values. For example, the ℛ 0 of measles is 12-14, polio is 5-7, and pertussis is 12-17 (Doherty et al., 2016) . For SARS-CoV-2, estimates typically range from 2-3 (Ying . However, the parameters ( , , and ) that make up ℛ 0 can differ substantially from place to place (Figure 1 , Delamater et al., 2019) . It follows that interventions to reduce ℛ to less than 1 may need to vary in aggressiveness across locations (Stier et al. 2020 ). Since ℛ 0 differs between groups of people, combining multiple groups together to estimate a population-wide ℛ 0 can produce misleading notions of disease spread. For example, though high measles vaccination rates in the United States keep ℛ below 1 nation-wide, smaller unvaccinated communities still experience serious outbreaks (Leslie et al., 2018) . In the current COVID-19 pandemic, disease transmission has so far been much higher in refugee and low income populations compared to non-refugee and high income populations (Chopra and Sobel, 2020; Lau et al., 2020; Ruiz-Euler et al., 2020) . Since ℛ 0 is an average, combining communities with high and low transmission may yield an estimate of ℛ 0 <1, yet disease may still readily spread (Li et al., 2011) . On the other hand, splitting populations may mean missing transmission events that occur between populations, thus underestimating ℛ 0 (Smith et al., 2009 ). Awareness of the consequences of how people are grouped can help us interpret ℛ values. Within groups, behavior associated with work, home, and recreation affects contact rates . As we have discussed above, SIR models often assume that contact rates, and thus ℛ 0 , depend on host density. Although, evidence is mixed as to whether larger cities have higher values of ℛ 0 for SARS-CoV-2 (Heroy, 2020; Stier et al., 2020) , built environments (e.g. hospitals, airport terminals, factories) do often have high values of ℛ 0 (Dietz et al., 2020) . This may partially explain patterns of explosive transmission in venues such as cruise ships and meat packing facilities Dyal et al., 2020; . Household contacts have also been particularly important to SARS-CoV-2 transmission dynamics (Bi et al., 2020) . Therefore, differences in ℛ 0 between populations could in part be due to the difference in household sizes between countries and cultures (Gardner et al., 2020; Singh and Adhikari, 2020; Su et al., 2020) . Similarly, the probability a new infection results from contact and probability of remaining infected over time may vary by population. For SARS-CoV-2, individuals with severe symptoms have 60 times more viral RNA in nasal swabs, which likely increases their ability to transmit the virus and the amount of time they remain infected . Since older individuals are more likely to develop severe infection , ℛ 0 is likely to be greater in populations with older individuals, such as in nursing homes (McMichael et al., 2020) or in developed countries (Dowd et al., 2020) . Misconception 3: The reproductive number is enough to tell us how large an epidemic will be It is tantalizing to imagine that ℛ 0 can be used to predict the extent an outbreak, since it can be calculated during the early stages of an epidemic. Indeed ℛ 0 is related to final epidemic size (Kermack and McKendrick, 1927) , but this relationship can be substantially affected by the fraction of the population infected initially and heterogeneity in transmission. If we rescale population sizes such that the initial susceptible population size 0 = 1, and we assume that population sizes are sufficiently large to neglect demographic stochasticity (Hartfield and Alizon, 2013; Tildesley and Keeling, 2009) , then , the fraction of the population infected during an epidemic, can be determined from the final epidemic size equation, = 0 (1 − −ℛ 0 ( − 0 ) ). Note that this equation prominently features 0 , the fraction of the population infected at the beginning of the outbreak (or at the beginning of an intervention). Efforts to reduce the reproductive number below 1 11 are understandably a high priority, but when ℛ 0 is close to or less than 1, the final outbreak size is more sensitive to changes in the fraction of infected individuals 0 than it is to changes in ℛ 0 (Figure 2) . ℛ 0 is a critical threshold for pathogen invasion, which matters when 0 is small, but which becomes less important as 0 gets larger (Figure 2 ). Now that SARS-CoV-2 infection rates are already substantial in populations around the globe, must be considered in addition to ℛ. For example, Pei et al. (2020) estimated that implementing social distancing policies one week earlier could have reduced the cases in the United States by early May, 2020 by 55% (over 700,000 cases) by keeping the number of infected individuals low at the time such policies were implemented. Figure 2 . Epidemic size contours and shading show that when ℛ 0 is close to 1, the epidemic is more strongly influenced by a reduction of 0 than by a reduction of ℛ 0 . For instance, if ℛ 0 = 0.95 (red dashed line), the epidemic could infect from less than 0.1% to greater than 16% of the population as 0 ranges from 0% to 3% of the population. Epidemic size was calculated using the final size equation, = 0 (1 − −ℛ 0 ( − 0 ) ), where 0 =1. Shading indicates the cube root of epidemic size with lighter colors corresponding to smaller outbreaks. While a final epidemic size can be calculated using ℛ 0 and 0 , the final size equation above does not apply to populations with heterogeneous infection risk (Andreasen, 2011; Ball, 1985; Hébert-Dufresne et al., 2020; Ma and Earn, 2006) . Heterogeneity could in principle be incorporated into the final epidemic size equation (Dwyer et al., 2000) , but estimating heterogeneity early in an epidemic can be challenging. Moreover, as we describe in misconception 1, heterogeneity in infection risk can change over time as a result of human behavior or interventions, such as for example the shutdowns in response to the COVID-19 epidemic (Dolbeault and Turinici, 2020; Ruiz-Euler et al., 2020) . Estimates of how future government restrictions and behavioral changes will alter heterogeneity in infection risk are thus critical for assessing the likely impact of the outbreak (Gomes et al., 2020) . Such estimates are also key in determining thresholds for herd immunity and in prioritizing the distribution of interventions such as vaccines when they first become available (Atkinson and Cheyne, 1994; Giambi et al., 2019) . As we have discussed, the reproductive number ℛ and its initial value ℛ 0 can be used to assess the potential for disease invasion and persistence, to predict the extent of an epidemic, and to infer the impact of interventions and of relaxing control measures. However, the utility of ℛ and ℛ 0 can easily be overstated. We have focused on three misconceptions that can lead to inaccurate perceptions of disease dynamics. These misconceptions are problematic no matter the complexity of the model or the reliability of the data used to estimate ℛ because populations vary over space and time and in their changing responses to disease. Considering these nuances when interpreting ℛ allows for a stronger 13 understanding of the patterns with which SARS-CoV-2 virus has traversed the globe, why it has impacted some populations more than others, and how best to limit future transmission. Infectious diseases of humans: dynamics and control The final size of an epidemic and its relation to the basic reproduction number Immunization in urban areas: Issues and strategies Deterministic and stochastic epidemics with several kinds of susceptibles Sixcountry survey on COVID-19 Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study The effect of public health measures on the 1918 influenza pandemic in U. S. cities Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Estimating the reproductive number and the outbreak size of COVID-19 in Korea Detroit under siege: The enemy within: The impact of the Covid-19 collision A time series method to analyze incidence pattern and estimate reproduction number of COVID-19 Complexity of the basic reproduction number (R0) On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations Novel coronavirus (COVID-19) pandemic: built environment considerations to reduce transmission Vaccine impact: Benefits for human health Heterogeneous social interactions and the COVID-19 lockdown outcome in a multi-group SEIR model Demographic science aids in understanding the spread and fatality rates of COVID-19 Pathogen-driven outbreaks in forest defoliators revisited: Building models from experimental data COVID-19 among workers in meat and poultry processing facilities -19 States Modest effects of contact reduction measures on the reproduction number of SARS-CoV-2 in the most affected European countries and the US Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand Mapping county-level mobility pattern changes in the United States in response to COVID-19 Intervention strategies against COVID-19 and their estimated impact on Swedish healthcare capacity National immunization strategies targeting migrants in six European countries Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Mask-wearing increases after a government recommendation: A natural experiment in the U.S. during the COVID-19 pandemic Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold Introducing the outbreak threshold in epidemiology Inferring the causes of the three waves of the 1918 influenza pandemic in England and Wales Beyond R0: Heterogeneity in secondary infections and probabilistic epidemic forecasting Perspectives on the basic reproductive ratio Metropolitan-scale COVID-19 outbreaks: how similar are they Transmissibility of 2019-nCoV, WHO Collaborating Centre for Infectious Disease Modelling Estimating the number of SARS-CoV-2 infections and the impact of social distancing in the United States Modeling infectious diseases in humans and animals Modeling Marek's disease virus transmission: A framework for evaluating the impact of farming practices and evolution A contribution to the mathematical theory of epidemics Identification and estimation of the SEIRD epidemic model for COVID-19 Vaccine effects on heterogeneity in susceptibility and implications for population health management COVID-19 in humanitarian settings and lessons learned from past epidemics It could have been much worse: The Minnesota measles outbreak of 2017 Incidence, clinical outcomes, and transmission dynamics of hospitalized 2019 coronavirus disease among 9596321 individuals residing in California and Washington, United States: a prospective cohort study The failure of R0 Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science (80-. ) Efficacy of face mask in 20 preventing respiratory virus transmission: a systematic review and meta-analysis Time-varying transmission dynamics of novel coronavirus pneumonia in China The reproductive number of COVID-19 is higher compared to SARS coronavirus Viral dynamics in mild and severe cases of COVID-19 Generality of the final size formula for an epidemic of a newly invading infectious disease Early transmissibility assessment of a novel coronavirus in Wuhan Transmission dynamics of HIV infection Mobility trends provide a leading indicator of changes in SARS-CoV-2 transmission Transmission potential of the novel coronavirus (COVID-19) onboard the diamond Princess Cruises Ship Early epidemiological assessment of the transmission potential and virulence of 2019 novel coronavirus in Wuhan City: China Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARS-CoV-2) outbreak Differential effects of intervention timing on COVID-19 spread in the United States Outbreak dynamics of COVID-19 in China and the United States The impact of changes in diagnostic testing practices on estimates of COVID-19 transmission in the United States Investigation of three clusters of COVID-19 in Singapore: implications for surveillance and response measures Predictions for COVID-19 outbreak in India using epidemiological models Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions Pattern of early human-to-human transmission of Wuhan The pluses and minuses of ℛ0 Mobility patterns and income distribution in times of crisis High contagiousness and rapid spread of Severe Acute Respiratory Syndrome Coronavirus 2 Impact of intervention on the spread of COVID-19 in India: A model based study Transmission potential and severity of COVID-19 in South Korea Age-structured impact of social distancing on the COVID-19 epidemic in India Can we spend our way out of the AIDS epidemic? A world halting AIDS model COVID-19 attack rate increases with city size. arXiv Infection fatality rate of SARS-CoV-2 infection in a German community with a super-spreading event Influence of socio-ecological factors on COVID-19 risk: a cross-sectional study based on 178 countries/regions worldwide. medRxiv 1-35 Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions China. Science (80-. ). 642, eabb6105 Is R0 a good predictor of final epidemic size: Foot-and-mouth disease in the UK Mathematical modelling of COVID-19 transmission and mitigation strategies in the population of Ontario COVID-19 R0: Magic number or conundrum? How generation intervals shape the relationship between growth rates and reproductive numbers Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV) [WWW Document Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Monitoring transmissibility and mortality of COVID-19 in Europe A data-driven analysis in the early phase of the outbreak We thank Amrita Bhattacharya for feedback on earlier versions of the text. This work was supported by startup funds from The Pennsylvania State University. DAK was also partially supported by National Science Foundation grant DEB-1754692. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.