key: cord-0936409-zefl5ul8 authors: Mallela, A.; Neumann, J.; Miller, E. F.; Chen, Y.; Posner, R. G.; Lin, Y. T.; Hlavacek, W. S. title: Bayesian Inference of State-Level COVID-19 Basic Reproduction Numbers across the United States date: 2021-09-28 journal: medRxiv : the preprint server for health sciences DOI: 10.1101/2021.09.27.21264188 sha: 71c36f749fa22788dd4ea967701fc43fc60d6086 doc_id: 936409 cord_uid: zefl5ul8 Although many persons in the United States have acquired immunity to COVID-19, either through vaccination or infection with SARS-CoV-2, COVID-19 will pose an ongoing threat to non-immune persons so long as disease transmission continues. We can estimate when sustained disease transmission will end in a population by calculating the population-specific basic reproduction number R_0, the expected number of secondary cases generated by an infected person in the absence of any interventions. The value of R_0 relates to a herd immunity threshold (HIT), which is given by 1-1/R_0. When the immune fraction of a population exceeds this threshold, sustained disease transmission becomes exponentially unlikely (barring mutations allowing SARS-CoV-2 to escape immunity). Here, we report state-level R_0 estimates obtained using Bayesian inference. Maximum a posteriori estimates range from 7.1 for New Jersey to 2.3 for Wyoming, indicating that disease transmission varies considerably across states and that reaching herd immunity will be more difficult in some states than others. R_0 estimates were obtained from compartmental models via the next-generation matrix approach after each model was parameterized using regional daily confirmed case reports of COVID-19 from 21-January-2020 to 21-June-2020. Our R_0 estimates characterize infectiousness of ancestral strains, but they can be used to determine HITs for a distinct, currently dominant circulating strain, such as SARS-CoV-2 variant Delta (lineage B.1.617.2), if the relative infectiousness of the strain can be ascertained. On the basis of Delta-adjusted HITs, vaccination data, and seroprevalence survey data, we find that no state has achieved herd immunity as of 20-September-2021. Although many persons in the United States have acquired immunity to COVID-19, either through 24 vaccination or infection with SARS-CoV-2, COVID-19 will pose an ongoing threat to non-immune 25 persons so long as disease transmission continues. We can estimate when sustained disease transmission 26 will end in a population by calculating the population-specific basic reproduction number ℛ ! , the expected 27 number of secondary cases generated by an infected person in the absence of any interventions. The value 28 of ℛ ! relates to a herd immunity threshold (HIT), which is given by 1 − 1/ℛ ! . When the immune fraction 29 of a population exceeds this threshold, sustained disease transmission becomes exponentially unlikely 30 (barring mutations allowing SARS-CoV-2 to escape immunity). Here, we report state-level ℛ ! estimates 31 obtained using Bayesian inference. Maximum a posteriori estimates range from 7.1 for New Jersey to 2.3 32 for Wyoming, indicating that disease transmission varies considerably across states and that reaching herd 33 immunity will be more difficult in some states than others. ℛ ! estimates were obtained from 34 compartmental models via the next-generation matrix approach after each model was parameterized using 35 regional daily confirmed case reports of COVID-19 from 21-January-2020 to 21-June-2020. Our ℛ ! 36 estimates characterize infectiousness of ancestral strains, but they can be used to determine HITs for a 37 distinct, currently dominant circulating strain, such as SARS-CoV-2 variant Delta (lineage B.1.617.2), if 38 the relative infectiousness of the strain can be ascertained. On the basis of Delta-adjusted HITs, 39 vaccination data, and seroprevalence survey data, we find that no state has achieved herd immunity as of 40 20- September-2021. 41 Introduction populations within the US (or elsewhere outside of China and Italy) is unclear. Several studies have 90 estimated ℛ ! for the US at the national level (13-15), the state level (16-18), and the county level (19) (20) . 91 The usefulness of a national estimate is unclear given the heterogeneity of the US, and none of the county-92 level estimates are comprehensive. Some state-level estimates are also incomplete (16, 18) . Because 93 responses to COVID-19 within the US have been and continue to be driven mainly by governors of US 94 states (21), we undertook a study to generate comprehensive state-level ℛ ! estimates through Bayesian 95 inference. With this approach, we were able to quantify uncertainty in each estimate through a parameter 96 posterior distribution. 97 In earlier work, we developed a compartmental model for COVID-19 transmission dynamics that 98 reproduces surveillance data and generates accurate forecasts for the 15 most populous metropolitan 99 statistical areas (MSAs) in the US (22). Here, for each of the 50 states, we found a state-specific parameter 100 posterior conditioned on this model from state-level COVID-19 surveillance data available from January 101 21 to June 21, 2020 (23). From these parameter posteriors, we then obtained region-specific ℛ ! and HIT Model 108 To obtain regional ℛ ! and HIT estimates, we used a compartmental model developed previously 109 (22). We found region-specific parameterizations that allow the model to reproduce surveillance data 110 (daily reports of new confirmed COVID-19 cases) available for each region of interest over a defined 111 period (e.g., January 21 to June 21, 2020). The model is able to account for a variable number of social-112 distancing periods. We considered versions of the model accounting for one, two, and three social-113 distancing periods. The number of social-distancing periods deemed best (i.e., to provide the most 114 parsimonious explanation of the data) for a given time period was determined using the model selection 115 procedure described by Lin et al. (22) . As in the study of Lin et al. (22) , the model has 14 parameters with 116 universal fixed values (applicable to all regions). The model also has 3( + 1) + 3 parameters with 117 region-specific adjustable values determined through Bayesian inference, where + 1 denotes the 118 number of social-distancing periods. In this study, for a given region, we censored case-reporting data 119 whenever the cumulative reported case count was less than 10 cases. We also specified the onset time of 120 the first social-distancing period as the earliest day on which the cumulative reported case count was 200 121 cases or more. A full description of model parameters is given in Lin et al. (22) . 122 Each region-specific model consists of a coupled system of ordinary differential equations (ODEs), 124 which are given by Lin et al. (22) . The ODEs were numerically integrated using the SciPy (29) interface 125 to LSODA (30) and the BioNetGen (31) interface to CVODE (32). Python code was converted to machine 126 code using Numba (33). The initial conditions were determined as in Lin et al. (22) . 127 To find the basic reproduction number ℛ ! , we considered a reduced form of the model of Lin To characterize the initial rate of exponential growth for a local epidemic within a given region, 139 we computed the epidemic growth rate λ as the dominant eigenvalue of the Jacobian of the reduced model 140 linearized at the disease-free equilibrium (36). The derivation of λ is provided in the SI. 141 To infer region-specific values of adjustable model parameters (and ℛ ! and HIT estimates), we 143 followed the Bayesian inference approach of Lin et al. (22) . In inferences, we used all region-relevant 144 confirmed COVID-19 case-count data available in the GitHub repository maintained by The New York 145 Times newspaper (23) for the period starting on 21-January-2020 and ending on 21-May-2020, 21-June-146 2020, or 21-July-2020 (inclusive dates were found to be consistent (SI Fig S1) . To ensure that MCMC sampling procedures converged, we 152 visually inspected trace plots for log-likelihood (SI Fig S2) and parameters (SI Fig S3) and pairs plots (SI 153 Fig S4) . We also performed simulations using maximum likelihood estimates (MLEs) for parameter 154 values to assess agreement of the simulations with the training data (SI Fig S5) . 155 The maximum a posteriori (MAP) estimate of a parameter is the value of the parameter 156 corresponding to the peak of its marginal posterior distribution, where probability density is highest. 157 Because we assumed a proper uniform prior distribution for each of the adjustable parameters, as in the 158 study of Lin et al. (22) , the MAP estimates are MLEs. 159 Bayesian uncertainty quantification 161 Following the Bayesian inference approach of Lin et al. (22) , we quantified uncertainty in 162 predicted trajectories of confirmed case counts for all 50 states, using data from January 21 to June 21, 163 2020. As illustrated in Fig 1 for We can propagate the uncertainty in into uncertainty in ℛ ! and HIT estimates, using the formula 172 for ℛ ! given below and HIT = 1 − 1/ℛ ! . In Fig 2, we show marginal posterior distributions for ℛ ! and 173 HIT for the states of New Jersey, Wyoming, Florida, and Alaska. We provide MAP estimates of the model 174 parameters for all states in SI Table S1 . Model parameters were found to be identifiable in practice. (We 175 have no proof of identifiability.) MAP estimates for ℛ ! and HIT for all 50 states are provided in SI Table 176 S2. These tables also provide 95% credible intervals. These estimates characterize the infectiousness of 177 SARS-CoV-2 ancestral strains in each region of interest. 178 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. To calculate the herd immunity threshold (HIT) for a specific region, we need to know the 180 corresponding region-specific value of the basic reproduction number ℛ ! , which is given by the following 181 formula (obtained as described in Materials and Methods and SI): 182 where characterizes the rate of transmission attributable to contacts between persons who are not 184 protected by social distancing, / denotes the fraction of infected persons who never develop symptoms 185 (i.e., the fraction of asymptomatic cases), / characterizes the rate at which asymptomatic persons recover 186 during the immune clearance phase of infection, 0 characterizes the rate at which symptomatic persons 187 with mild disease recover or progress to severe disease, 1 is a constant characterizing the relative 188 infectiousness of presymptomatic persons compared to symptomatic persons (with the same behaviors), 189 / is a constant characterizing the relative infectiousness of asymptomatic persons compared to 190 symptomatic persons (with the same behaviors), m denotes the number of stages in the incubation period, 191 and 2 characterizes disease progression, from one stage of the incubation period to the next and ultimately 192 to an immune clearance phase. The value of ℛ ! depends on one inferred region-specific parameter, , and 193 seven fixed parameters, which have values taken to be applicable for all regions (i.e., / , / , 0 , 1 , / , 2 , 194 and ). Estimates of these fixed parameters were taken from Lin Delta-adjusted ℛ ! ranges from 5.6 for Wyoming to 18 for New Jersey (from the multiplier given above 201 and SI Table S2 ). The population-weighted Delta-adjusted ℛ ! for the US is 12. These estimates indicate 202 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint that the herd immunity threshold (HIT) for the Delta variant of SARS-CoV-2 ranges from 82% to 94%. 203 HIT estimates are directly determined by estimates of the basic reproduction number, which are 205 related to the initial growth rate of the epidemic in a given region. Here, our ℛ ! estimates are conditioned 206 on a compartmental model that has been parameterized to reproduce case-reporting data available for each 207 region over a five-month period (January 21 to June 21, 2020). We can use parameter estimates obtained 208 for each region to calculate the initial epidemic growth rate λ, which is directly comparable to early 209 surveillance data (Fig 3 and SI Fig S6) . We provide MAP estimates and 95% credible intervals for λ, ℛ ! , 210 and HIT for selected states in Table 1 . MAP estimates and 95% credible intervals for , ℛ ! , and HIT for 211 all states are provided in SI Table S2 . These estimates are based on state-specific marginal posteriors for 212 the parameter of our compartmental model. State-specific MAP estimates and 95% credible intervals 213 for (and other adjustable model parameters) are given in SI Table S1 . As can be seen (e.g., in Fig 3) , 214 our estimates are consistent with early case reporting data during the exponential takeoff phase of disease 215 transmission. 216 For each state, we used data from January 21 to June 21, 2020 to infer the MAP estimate of (and 218 the values of the other region-specific adjustable model parameters). Thus, our estimates are derived from 219 a particular subset of the available surveillance data. To check the robustness of MAP estimates for to 220 variations in training data, we performed a sensitivity analysis wherein we inferred using data collected 221 over three distinct periods in 2020: 1) January 21 to May 21, 2) January 21 to June 21, and 3) January 21 222 to July 21. By visualizing our estimates with a rank order plot (Fig 4) and conducting pairwise two-sample 223 Kolmogorov-Smirnov tests (38), we found that the 4-, 5-, and 6-month training datasets yielded estimates 224 for that were not statistically significantly different from each other. The MAP estimates for obtained 225 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. using the 4-, 5-, and 6-month datasets are listed in SI Table S3 . We assessed sensitivity by computing the 226 relative error between the estimates obtained from the 5-month dataset and the average estimate over 227 all datasets considered. We found that none of the state-level MAP estimates for showed sensitivity 228 (i.e., a relative error exceeding 100% in magnitude) to variations in the training data (SI Table S4 ). The 229 largest relative error was 12% (for Kansas). 230 Global asymptotic stability of the disease-free equilibrium 231 The model of Lin et al. (22) has a globally asymptotically stable disease-free equilibrium (DFE) if 232 ℛ ! < 1, which can be deduced by following the approach of Shuai and van den Driessche (39). As a 233 consequence, the model predicts that the epidemic will be extinguished as the system dynamics are 234 attracted to the DFE. 235 To confirm that the model behaves as expected around the HIT, we conducted a perturbation 236 analysis for the states of New York (Figs 5A and 5B) and Washington (Figs 5C and 5D). We simulated 237 disease dynamics starting from an arbitrarily chosen initial condition near the HIT number of persons, $ , 238 given by the following formula: $ = HIT × ! , where ! denotes the population size of the region 239 considered. We defined the size of our perturbation as ε = 0.2 × $ for Figs 5A and 5C and as = 240 −0.2 × $ for Figs 5B and 5D. The initial condition was ! − $ − 1 + ε susceptible persons, 1 infected 241 person, and $ − recovered persons. As expected, for $ < HIT × ! (Figs 5A and 5C), the number of 242 infectious persons grows over time, whereas for $ > HIT × ! (Figs 5B and 5D), the number of 243 infectious persons decays over time. 244 From our state-specific HIT estimates and other information (discussed below), we were able to 246 calculate percent progress toward herd immunity for each state (Fig 6, SI Table S5 ). We estimated the 247 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint percent progress of each state's population toward herd immunity, ∈ [0%, 100%], using the following 248 equation (the derivation of which is given in the SI): 249 where ℛ ! is the population-specific basic reproduction number that we estimated for ancestral strains (SI 251 Table S2 ), Delta is a multiplier that accounts for the increased transmissibility of SARS-CoV-2 variant 252 Delta, 4 denotes the fraction of the population with immunity acquired through infection, 3 is the fraction 253 of the population that has been vaccinated (24), 4 is the fraction of infected persons who are protected 254 against productive infection (i.e., an infection that can be transmitted to others), and 3 is the fraction of 255 vaccinated persons who are protected against productive infection. Recall that we use Delta = 2.46 (27-256 28). We estimate that 4 = 1.0 (40) and 3 = 0.66 (41). We obtain four different estimates for 4 as 257 follows. In the first case, we obtain 4 as the cumulative number of detected cases within a population 258 divided by the population size. In the second case, we adjust our previous estimate for 4 by a multiplier 259 of 5.8 (42). In other words, we assume that the true disease burden is 5.8 times higher than the detected 260 number of cases. In the third case, we obtain 4 as the fraction of the population that has been infected 261 according to the latest serological survey results reported online at Ref. (25). In the fourth case, we assume 262 As can be seen in Fig 6C, which is based on case reporting data, 18 of the 50 states have reached 266 herd immunity. However, in Fig 6D, which is based on serological survey data, none of the states have 267 reached herd immunity. South Dakota is closest to herd immunity, with 84% of the immune persons 268 required for herd immunity. Idaho is furthest from herd immunity, with 45% of the immune persons 269 required for herd immunity. The mean (median) progress toward herd immunity, across all states, is 63% 270 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. One of our most important findings is quantification of how COVID-19 transmissibility, in terms 273 of the basic reproduction number ℛ ! , varies across the 50 US states. The MAP value of ℛ ! for ancestral 274 strains of SARS-CoV-2 ranges from 2.3 for Wyoming to 7.1 for New Jersey. The population-weighted 275 mean for the US is 4.7. These estimates indicate that the herd immunity threshold (HIT) for the Delta 276 variant of SARS-CoV-2 ranges from 82% to 94%, assuming that Delta is 2.46 times more transmissible 277 than ancestral strains. The uncertainty in each ℛ ! estimate was quantified: 95% credible intervals are 278 indicated in Figure 4 . The 95% credible intervals for ancestral HIT estimates are given in SI Table S2 . 279 Because we can estimate the relative effort required to reach herd immunity across the US (in terms of 280 HIT), resources for vaccination campaigns can be targeted to those areas where it is more difficult to 281 achieve herd immunity. 282 Our ℛ ! and HIT estimates differ from estimates given in previous studies. For example, various 283 researchers derived point estimates for ℛ ! from data using tools from time-series analysis, without 284 assuming an underlying mechanistic model (13, 15). These tools depend on slope estimation and thus can 285 be expected to depend sensitively on noise and errors in early case-reporting data. Ives and Bozzuto (16) are as follows. Our ℛ ! and HIT estimates were obtained from a model consistent with new case-reporting 293 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint data, as illustrated in Figs 1 and 3. We were able to provide estimates for all 50 states (Fig 4, SI Table S2 ), 294 and we were able to obtain a Bayesian quantification of the uncertainty in each estimate (Fig 4, SI Table 295 S2). 296 In the face of Delta, the estimates of Fig6C (based on case reporting data) suggest that a majority 297 of states have yet to achieve herd immunity, and the estimates of Fig 6D ( to 45% for Idaho ( Fig 6D) ~20 months (counting from January 2020) into the COVID-19 pandemic and 304 ~9 months after vaccines became widely available, it seems that this situation will persist for months, if 305 not years. How can the US accelerate the approach to herd immunity? 306 Policies that encourage infection of children and vaccinated persons who have healthy immune 307 systems may be rationalized because such persons seem to be well-protected against severe (but not mild) 308 disease (46) and infected persons seem to have greater protection against productive infection (40). 309 However, this approach has obvious drawbacks, starting with the risks of infection. Another is that non-310 immune persons may not be able to self-identify as such. Unfortunately, it seems that we cannot rely on 311 currently available vaccines to stop community transmission. Delta-adjusted HITs are mathematically 312 impossible to achieve through vaccination alone because these HITs are close to 1 (SI Table S2 ) and 313 vaccine protection against productive infection is imperfect (i.e., 3 is significantly less than 1) (41). Thus, 314 use of Delta-targeted vaccines may be needed to accelerate the approach to herd immunity and to minimize 315 COVID-19 impacts. 316 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. One potential benefit of our comprehensive state-level ℛ ! estimates is that they quantify how 317 differences in social structure and contact patterns across the US-the factors presumably underlying the 318 spatial heterogeneity in and ℛ ! -influence the spread of an aerosol-transmitted virus (47-48). This 319 information, by identifying the regions in the US where transmission is likely to be highest, could be 320 useful for responding to future pandemics caused by viruses similar to SARS-CoV-2. 321 Our study has several notable limitations. Our HIT estimates are potentially biased downward 322 because of general awareness within the US of the impacts of COVID-19 in other countries (e.g., China 323 and Italy), which could have resulted in a fraction of the US population changing their behaviors to protect 324 themselves from COVID-19 before the start of the local epidemic. In addition, our estimation of percent 325 progress toward herd immunity crucially depends on seroprevalence estimates of the true disease burden. 326 These estimates are associated with some uncertainty (49-51). As illustrated in Fig 6, percent progress 327 toward herd immunity is underestimated if serological tests fail to detect all cases of infection. The reader 328 must also be cautioned that our analysis depends on a number of assumptions. For example, we considered 329 a compartmental model in which populations are taken to be well-mixed and to lack age structure. This is 330 clearly a simplification. More refined estimates could be obtained by making the model more realistic, but 331 this would have the drawback of increasing the complexity of inference, which at some point would make 332 inference impracticable. 333 preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. the initial slope of the solid curve corresponds to (calculated as described in Materials and Methods), 569 the crosses indicate empirical cumulative case counts, and the broken line is the model prediction based 570 on MAP estimates for adjustable parameters. The solid curve is derived from the reduced model (Eqs. 1-571 8 in the SI). This curve shows cumulative case counts had there not been any interventions to limit disease 572 transmission. As can be seen, the initial slopes of the solid and broken curves are comparable. We selected 573 = 0 for New Jersey and Wyoming and = 1 for Florida and Alaska. Among 35 states with = 0, New 574 Jersey has the largest inferred λ value (0.45) and Wyoming has the smallest inferred value (0.13). Among 575 15 states with = 1, Florida has the largest inferred value of (0.39) and Alaska has the smallest inferred 576 value of (0.13). It should be noted that, in contrast with Fig 1, preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. all 50 US states. The different symbols refer to different training datasets used to estimate ℛ ! . Open 582 triangles correspond to surveillance data collected from January 21 to May 21, 2020, filled circles 583 correspond to surveillance data collected from January 21 to June 21, 2020, and open squares correspond 584 to surveillance data collected from January 21 to July 21, 2020. Estimates of ℛ ! are sorted by state from 585 largest to smallest values according to the ℛ ! estimates derived from the surveillance data collected for 586 January 21 to June 21, 2020. The whiskers associated with each filled circle indicates the 95% credible 587 interval (inferred from the 5-month dataset). States are indicated using two-letter US postal service state 588 abbreviations (https://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf). 589 590 591 592 593 594 595 596 597 598 599 600 NJ MI FL AZ NY IN IL PA LA MO MS CT GA TN WI TX MD NC ID AL CA MA SC OH CO WA OK NV VA KY KS UT VT OR SD RI MN NH IA AR NM DE NE HI WV ME MT ND All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. given by seroprevalence survey results), and gray bars (Panels B and D) correspond to the fourth scenario 630 (i.e., 4 given by seroprevalence survey results adjusted for lack of detection of asymptomatic cases). 631 Estimates for are sorted by state from largest to smallest values according to the second scenario (Panels 632 A and C) and the fourth scenario (Panels B and D All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We derive ℛ ! from a simplified form of the compartmental model of Lin et al. (1) . The reduced 24 model is obtained by omitting variables and terms for interventions, including social distancing, 25 quarantine, and self-isolation. Thus, the reduced model describes disease transmission dynamics in the 26 absence of interventions. The equations of the reduced model are as follows: 27 [5] 32 variable denotes the population of susceptible persons. The variablesto 0 denote populations of 41 exposed persons, e.g., persons incubating virus but not symptomatic. As noted earlier, the incubation 42 period is divided into stages. The variable denotes the population of persons who have progressed 43 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. introduced to the $5 compartment will spend in a single visit to the $5 compartment. The matrix , which 63 is non-negative, is defined as follows: 64 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint given region, ! denotes the total population size, " denotes the cumulative number of cases detected, 82 8 denotes the cumulative number of asymptomatic cases, 9 denotes the cumulative number of 83 vaccinations completed, 9,; denotes the number of persons who were susceptible at the time of 84 vaccination, 9,< denotes the number of persons who had recovered from infection at the time of 85 vaccination, = denotes the cumulative number of all cases, ε 9 denotes the fraction of vaccinated 86 individuals protected from productive infection (i.e., an infection that can be transmitted to others), ε < 87 denotes the fraction of recovered individuals protected from productive infection, < denotes the number 88 of individuals who have recovered from infection, HIT denotes the herd immunity threshold for ancestral 89 strains, Delta denotes the infectiousness of SARS-CoV-2 variant Delta relative to ancestral strains, 5 ≡ 90 HIT × ! denotes the threshold number of persons with immunity needed for herd immunity (in the face 91 of ancestral strains), . denotes the estimated number of persons with immunity, + ≡ 8 / = denotes the 92 fraction of all cases that are asymptomatic, < ≡ < / ! denotes the fraction of the population with 93 immunity acquired through infection, and 9 ≡ 9 / ! denotes the fraction of the population that has been 94 vaccinated. 95 We assume that ! is constant. We take < = = to be a good approximation. We assume that we 96 know ! , " , and 9 . We assume that susceptible and recovered individuals have the same probability of 97 being vaccinated. From our assumption that susceptible and recovered individuals have the same 98 probability of being vaccinated, it follows that 9,; = (1 − < ) 9 and 9,< = < 9 . These relations are 99 consistent with 9 ≡ 9,; + 9,< . The number of individuals with immunity (protection from productive 100 infection) is given by 101 . = ε C 9,; + ε < < = ε C (1 − < ) 9 + ε < < [10] 102 We assume that Delta gives the value of for SARS-CoV-2 variant Delta relative to for ancestral 103 strains. We assume all other model parameters are the same for Delta. Thus, Delta ℛ ! is the basic 104 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint reproduction number in the face of Delta. We define , percent progress toward herd immunity, as 105 = . ! Delta ℛ !`/ -× 100% 106 Using the expression given above for . (Eq. 10), 1 − 1/( Delta ℛ ! ) as the Delta-adjusted HIT, and 5 = 107 HIT × ! , we find Eq. 2 in the main text. 108 accounts for an initial social distancing period followed by additional periods. We considered = 0, 1 140 and 2 and selected the best using the model selection procedure of Lin et al. (1) . Crosses indicate 141 observed daily case reports. The shaded region indicates the prediction uncertainty and inferred noise in 142 detection of new cases. The color-coded bands within the shaded region indicate the median and different 143 credible intervals (e.g., dark purple corresponds to the median, the lightest shade of yellow corresponds 144 to the 95% credible interval, and gradations of color between these two extremes correspond to different 145 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 28, 2021. ; https://doi.org/10.1101/2021.09.27.21264188 doi: medRxiv preprint credible intervals as indicated in the legend). In each panel, the vertical broken line indicates the onset 146 time of the first social-distancing period. For states with = 1, there is an additional (rightmost) broken 147 line, which indicates the onset time of the second social-distancing period. The model was used to make 148 forecasts of new case detection for 14 days after June 21, 2020. The last prediction date was July 5, 2020. 149 150 Figure S6 . Consistency of model-derived λ estimates with empirical growth rates during initial 151 exponential increase in disease incidence in 46 states of the US (i.e., excluding New Jersey, Wyoming, 152 Florida, and Alaska; see Fig 3 in the main text) . In each panel, the initial slope of the solid curve 153 corresponds to (calculated as described in Materials and Methods), the crosses indicate empirical 154 cumulative case counts, and the broken line is the model prediction based on MAP estimates for adjustable 155 parameters. The solid curve is derived from the reduced model (Eqs. 1-8 in the SI). This curve shows 156 cumulative case counts had there not been any interventions to limit disease transmission. As can be seen, 157 the initial slopes of the solid and broken curves are comparable. It should be noted that, in contrast with 158 Fig S5, Daily forecasting of regional epidemics of coronavirus disease with bayesian 162 uncertainty quantification The construction of next-generation matrices for 164 compartmental epidemic models Mathematica: A System for Doing Mathematics by Computer PyBioNetFit and the biological property specification language. iScience The matrix , which is non-singular (i.e., invertible), is defined as follows: 66We find ℛ ! as the spectral radius (i.e., the dominant eigenvalue) of the matrix /-(2), which is given 68 by Eq. 1 in the main text. 69 The epidemic growth rate λ is defined as the dominant eigenvalue of the Jacobian of the reduced 71 model linearized at the disease-free equilibrium (DFE). Thus, λ is the largest root of the characteristic 72 polynomial for the 7-dimensional Jacobian matrix , which is equivalent to − . We used the 73CharacteristicPolynomial function in Mathematica (3) to find : 74The largest root was found numerically. Solutions were based on state-specific estimates for and the 77 estimates of Lin et al. (1) for other parameters in Eq. 9. 78 In this section, we explain the assumptions and derive the formula for our metric of progress 80 toward herd immunity (Eq. 2 in the main text). First, we define the variables used in our analysis. For a 81