key: cord-0746817-zxju15oq authors: Newcomb, Ken; Smith, Morgan E.; Donohue, Rose E.; Wyngaard, Sebastian; Reinking, Caleb; Sweet, Christopher R.; Levine, Marissa J.; Unnasch, Thomas R.; Michael, Edwin title: Iterative data-driven forecasting of the transmission and management of SARS-CoV-2/COVID-19 using social interventions at the county-level date: 2022-01-18 journal: Sci Rep DOI: 10.1038/s41598-022-04899-4 sha: 51ba58897462b104757b1ea1716d0272fe9c04a3 doc_id: 746817 cord_uid: zxju15oq The control of the initial outbreak and spread of SARS-CoV-2/COVID-19 via the application of population-wide non-pharmaceutical mitigation measures have led to remarkable successes in dampening the pandemic globally. However, with countries beginning to ease or lift these measures fully to restart activities, concern is growing regarding the impacts that such reopening of societies could have on the subsequent transmission of the virus. While mathematical models of COVID-19 transmission have played important roles in evaluating the impacts of these measures for curbing virus transmission, a key need is for models that are able to effectively capture the effects of the spatial and social heterogeneities that drive the epidemic dynamics observed at the local community level. Iterative forecasting that uses new incoming epidemiological and social behavioral data to sequentially update locally-applicable transmission models can overcome this gap, potentially resulting in better predictions and policy actions. Here, we present the development of one such data-driven iterative modelling tool based on publicly available data and an extended SEIR model for forecasting SARS-CoV-2 at the county level in the United States. Using data from the state of Florida, we demonstrate the utility of such a system for exploring the outcomes of the social measures proposed by policy makers for containing the course of the pandemic. We provide comprehensive results showing how the locally identified models could be employed for accessing the impacts and societal tradeoffs of using specific social protective strategies. We conclude that it could have been possible to lift the more disruptive social interventions related to movement restriction/social distancing measures earlier if these were accompanied by widespread testing and contact tracing. These intensified social interventions could have potentially also brought about the control of the epidemic in low- and some medium-incidence county settings first, supporting the development and deployment of a geographically-phased approach to reopening the economy of Florida. We have made our data-driven forecasting system publicly available for policymakers and health officials to use in their own locales, so that a more efficient coordinated strategy for controlling SARS-CoV-2 region-wide can be developed and successfully implemented. . Model fits compared to confirmed daily case data in 9 representative Florida counties. Gray curves represent county-specific model predictions and red points represent confirmed case data obtained from Johns Hopkins University 47 . Fitting was started after at least 10 confirmed cases were reported. Results from model fits to representative countries, stratified by initial incidence growth rate in each county ( www.nature.com/scientificreports/ Because the dynamics of COVID-19 are significantly influenced by changes in social behaviors, it is critical to iteratively calibrate transmission models, such as the present SEIR model, to new data and actively update the resulting predictions accordingly 24, 27 . Supplementary Fig. S2 demonstrates the shift in model predictive performance as the model is updated sequentially with various lengths of incoming longitudinal data. Crossvalidation analysis showed that carrying out such model calibrations to 2-week sequential blocks of data can maintain the relative mean-square error to consistently below 20%, while also being computationally feasible. Implementing this 2-week sequential updating procedure thus allowed us to incorporate information regarding changing transmission conditions in each county as effectively as possible into the model. Supplementary Table S2 summarizes the values of the posteriors obtained by model fitting to the 14-day case day prior to September 30th for the social distancing parameter, d, which modifies the transmission rate, β, in each county, as well as for the fraction of the respective county population remaining under mobility restriction as estimated using the Unacast mobility data (see "Methods").The results show that while the values of each of these two key social parameters varied between the present counties, the values of the social distancing parameter, d, appears to be comparatively less variable compared to those estimated for the lockdown fractions (Supplementary Table S2 ). This suggests that the between-county variations in SARS-CoV-2 outbreak dynamics, and the individual county-level response to the intervention scenarios, reported here (from October 1st 2020) may be a reflection of the combined effect of the initial incidences and variations in the numbers of the susceptible populations in a county that are released from restricted movement. Forecasting the epidemic and impacts of social interventions. We used the models updated using the infection/death data reported to September 30th 2020 in each county to simulate both the local epidemic dynamics and compare the dynamical impacts of six different social interventions as described in "Methods" and depicted in Supplementary Fig. S3 . The model predictions for infected cases under each intervention scenario through the end of the year are shown in Figs. 2 and 3, stratified by initial incidence growth rate (Group a < 0.05, Group b > 0.05-0.15, Group c > 0.15; see "Methods"). Scenario 1 is the least aggressive option considered with no interventions put in place and a full release of strict stay-at-home orders after September 30th. The results for this scenario show that because the social restrictions in Florida were lifted before the first epidemics ended, resurgence of the epidemic will be inevitable in every county (Fig. 2) . Indeed, an average of 27% (with a range of 15-47%) of the overall population across all counties is projected to become infected at the peaks of the resulting 2nd waves under this scenario (Fig. 2 , Table 1, Supplementary Table S1 ). Note that these, and subse- Epidemic forecasts for individual counties in the scenario where all interventions are lifted following the initial lockdown (S1). Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rate in each county (Group a < 0.05, Group b > 0.05-0.15, Group c > 0. 15 ). The intervention scenario (also see Supplementary Fig. S3 ) represents a full release of lockdown and social distancing from October 1st. The y-axis is shown in log-scale to better visualize the difference between the size of the first and second epidemic waves. The range of epidemic peaks and epidemic ending dates for each group is shown by red and blue dotted vertical lines, respectively. The 2nd wave peaks of infected cases occurs between September 27th-November 2nd for Group a, October 3rd-November 7th for group b, and October 3rd-November 2nd for Group c. The epidemic ending dates occur between December 2nd-January 21st, 2021 for Group a, December 2nd-January 25th, 2021 for group b, and January 16th, 2021-March 2nd, 2021 for Group c, respectively. www.nature.com/scientificreports/ quent model forecasts, account for all infected cases, including those who are not yet infectious (exposed class), asymptomatic, presymptomatic, and symptomatic. The county level model predictions from this scenario also highlight the variation in the peak size of the 2nd waves that different counties could have faced with the lifting Figure 3 . Epidemic forecasts for individual counties under four different social intervention scenarios. Note the differences in y-axis values when comparing scenarios. Each curve represents the median prediction for a given county. The results are stratified by initial incidence growth rates observed in each county (Group b Table 1) . The time course of the epidemic is also variable with higher incidence counties expected to see generally later peaks compared to lower incidence counties (Fig. 2) . Figure 2d shows that the predicted size of the 2nd wave peaks are directly related to those of the first waves that occurred in each county, further underlining the impact that initial variation in local conditions of virus transmission can have on the size of subsequent county-level infection resurgences in each county. The simulations also reveal that while the size of the 2nd waves will be large with the full release of lockdowns, the epidemics in each county will nonetheless, as expected, eventually end ( Fig. 2) , with the possibility that this will occur earlier in the case of the low incidence counties compared to the case with high incidence counties where the corresponding 2nd waves are predicted to end much later. The inclusion of social distancing measures through October and November in scenarios 2 and 3, respectively, is predicted to have two key effects (Fig. 3) . First, it is to be noted that these measures will not prevent the occurrence of sizable 2nd waves; however, they will shift the timing of the county-level 2nd epidemic peaks further into the future, with this shift more pronounced for scenario 3 (from the October/November peaks predicted for scenario 2 to the December/January peaks forecasted for scenario 3; see legend to Fig. 3 ). These measures, however, will significantly reduce the size of the 2nd epidemic peaks, with the more intensive scenario 3 bringing about an average case reduction of 76% (with a range of 37-83%) compared to scenario 1 (Table 1) . Again, these outcomes will vary by county group, with shifts to 2nd peaks, and resolution of the epidemics, occurring generally later among the high incidence countries (Group c), while the greatest reduction in peak cases will occur for the low incidence counties (Fig. 3 , Table 1 ). Implementation of contact tracing and quarantine measures to prevent a fraction of the infectious population from spreading the disease (25% quarantine rate in scenario 4 and 50% quarantine rate in scenario 5 both through March 2021) along with sustained social distancing measures through December 2020, is predicted to have a uniformly high suppressive effect on the course of the 2nd waves in each county. Scenario 4 reduces the size of the averaged county-level epidemic peak to affect just 3% of the total population (with a range of 0-17%), www.nature.com/scientificreports/ while scenario 5 is predicted to reduce the average peak infection size to just 0.6% of the overall population in the state (with a range of 0-8.5%). This represents an average peak reduction compared to scenario 1 of 89% (with a range of 61-97%) for scenario 4 and 98% (with a range of 80-100%) for scenario 5. Furthermore, the results show that scenario 5 could even bring about breakage of epidemic transmission in some counties, particularly in the case of those that exhibited the lowest initial infection incidences (Table 1) . Interestingly, these intervention scenarios also appear to generally lessen the between-county group variations in the timings and peaks of the predicted 2nd epidemics (Fig. 3) . The above scenario differences are also apparent in the predictions of the required hospitalizations at the county level (Fig. 4) . However, the hospitalization forecasts also indicate that without the more aggressive interventions, such as those modeled in scenarios 4 and 5, there would be a high risk that predicted cases will exceed existing county-level hospital capacities (Table 2 ). This serious outcome will also vary significantly by geographic location. Figure 5 shows the predictions arising from the simulation of the most intensive of the social intervention scenarios (scenario 6), viz. maintaining social distancing, lockdown, and a 25% quarantine rate from October 1st through the end of our simulation period, March 2021. The results show that this scenario is the only one among the six scenarios investigated that would have prevented the occurrence of a 2nd wave of COVID-19 in all the modeled counties. It would also hasten the ending of the epidemic locally with all low incidence counties predicted to achieve elimination of their epidemics as early as between November 1st 2020 to January 26th 2021, whereas high incidence counties will see their corresponding epidemics ending between December 20th 2020 to February 25th 2021. Figure 6 depicts the proportions of the populations recovering from infection and thus developing immunity to infection through time in each county for scenarios 1 (top panel) and 6 (bottom panel). These results show that these scenarios may bring about extinction of the epidemic in each group of counties via different mechanisms. In the case of scenario 1, the epidemics are ended through the development of high levels of herd immunity (between 88 to 97%) in the community, with as expected lower population-level immunity required to bring out epidemic extinction in the lower incidence counties. By contrast, the results for scenario 6 indicate that extinction can also be brought about by instituting strong long-duration social containment measures that can reduce transmission to sufficiently low levels to bring about epidemic fade-outs. However, it is important to note that this impact comes with the cost of generating very low levels of herd immunity in each community by the end of the epidemics, raising the possibility of the inevitable resurgence of transmission should new infected individuals bring the virus back into these communities. Although scenario 1 can produce high levels of herd immunity (Fig. 6) , it is clear, however, that this would be associated with higher death tolls than in the case of scenario 6 ( Fig. 7) . Cumulative predicted deaths through the entire period of these simulations (i.e. October 1st 2020 to end of March 2021) ranged from 70 to 4223 in the low incidence counties to as high as 28,130 in the high incidence counties in the case of scenario 1. These were significantly lower in the case of scenario 6 with the corresponding cumulative deaths ranging from 6 to 300 in the low incidence counties to between 185 to 3330 in the high incidence counties (Fig. 7) . These findings underscore the health-economy trade-offs involved in using an approach focused on the evolution of herd immunity as opposed to one based on the use of more socially-disruptive measures for containing the present pandemic. An immediate full release of social measures is the least economically disruptive option, but results in higher cases, hospitalizations, and deaths. By contrast, implementing longer periods of social distancing measures will optimize reductions in health outcomes but will affect the working of the economy. Our goals in this work were two-fold. The first was to assess if it is possible to use publicly available longitudinal infection case and human movement data to derive reasonable mathematical models of SARS-CoV-2 transmission to allow the simulation and evaluation of the course of the ongoing pandemic at the local county level in the United States. The second goal was to evaluate how the viral contagion dynamics might interact with social options for controlling COVID-19, such that the results could be used to identify those measures that will enable the safe containment of the virus in the absence of a viable vaccine. We also attempted to determine the implications of variance in virus transmission risk across smaller spatial units within a region, such as local counties, for the design of the optimal social strategies for curbing the contagion. Developing and using reliable data-driven models for forecasting live local epidemics is challenging given the need for both the locality-specific temporal data required for updating models, and the necessity that predictions have to be made within the lead times requisite for making effective public responses 22 . Here, we have addressed this problem via the implementation of an iterative data-model assimilation-based forecasting system that acquires and processes the latest data, updates our SEIR COVID-19 model, and generates new sequential forecasts over time. Key features of the system include procedures that leverage the availability of open source API-enabled case and mortality surveillance data that are reported daily by health departments at the county level 47 , the incorporation of independently-quantified county-wide non-essential movement data to serve as an estimator of the level of population mixing through time, and the Bayesian calibration of our model on a sequential basis. An additional recent feature of the developed system is the use of a continuous analysis framework to automate the computational pipeline to handle the various stages of converting the raw data into new forecasts, including: data assembly, modelling and forecasting, and presentation of the forecasts relevant to policy makers 27, 45 . This makes it possible to generate high-quality forecasts for a large number of study settings much more effectively and speedily. We have used our modelling system to examine the effectiveness of a range of likely SARs-CoV-2 social intervention scenarios for containing the contagion at local levels, starting with investigations of the impact of the www.nature.com/scientificreports/ state-wide phased easing of the community lockdown that was implemented in Florida between April 3 2020 to May 4 2020 (scenario 1). Figure 2 depicts the major outcome of this policy response, viz. the inevitability of the emergence of significant 2nd waves in all counties if all other social measures, such as social distancing (mask wearing and observation of physical distancing), are also discontinued before the 1st local epidemics are fully www.nature.com/scientificreports/ ended. While this prediction can be considered to be as expected and so unremarkable, a striking and possibly less commented upon feature, however, is that the predicted sizes of the 2nd wave peaks or intensity of the 2nd waves will vary between counties as a positive function of variations in the size of their 1st wave peaks. This indicates that the subsequent intensity of virus transmission following the full release of all social protective measures in a community will depend fundamentally on the initial incidence at the time of epidemic establishment in a locality. It will additionally also depend on the number of infected individuals remaining in each county following the ending of the state-wide lockdown that took place before the 1st waves had been fully controlled. These findings highlight the dangers of imposing a one size fits all policy (here pertaining to the decision to ease lockdowns on a common date across Florida) for managing a spatially variable contagion. The results pertaining to the numerical size of the 2nd peaks are shown in Table 1 , and indicate that the health burden of the pandemic would also vary markedly between counties from as low as 1459 cases in Lafayette to as high as 742,898 cases for the most populous Miami-Dade county. This further underscores the fact that apart from variable resurgences in infection, spatial heterogeneity in infection burdens could also be expected if social measures are fully released in all counties. There would also be considerable variations in the course of the 2nd waves even within each incidence group (Fig. 2) , although in general low incidence counties would present www.nature.com/scientificreports/ earlier and lower 2nd wave peaks compared to the later timings and higher peaks predicted for the corresponding medium and high incidence group of counties. Full release of social measures would, however, as expected, result in the eventual extinction of the local epidemics in all counties, because of depletion of susceptible individuals and development of high proportions of immune individuals in each population (Fig. 6) . Although this result supports the notion that permitting the development of herd immunity in populations as one way of controlling the pandemic, the predictions regarding the local and state-wide hospitalization cases (Table 2 ) and deaths (Fig. 7) point to the dangers of adopting such an option [8] [9] [10] 12 . Our simulations of the impacts of social control measures of varying strength and nature demonstrated overall the vital importance of continuing with these measures following phased lockdown release for containing the epidemic while waiting for more effective and less-socially disruptive pharmaceutical measures (Figs. 2, 3, 6; Table 1 ). According to our findings, while the two social measures investigated here, viz. maintaining current social distancing over a shorter (to October 14th) and longer (to November 30th) periods, and testing, contact tracing and quarantining at a moderate (25%) or higher (50%) level, would not have prevented the emergence of 2nd waves in the majority of cases, they would nonetheless delay infection peaks and reduce the numbers of patients requiring admissions to hospitals. Indeed, the inclusion of quarantine measures to March 2021, while ending social distancing by mid-October or end November 2020, for example, would have decreased the 2nd wave peak numbers of infection or hospitalizations required to over 90% of the respective 1st peak numbers in each county. Inclusion of strong contracting tracing and quarantine (scenario 5) could have led to very low levels or even interruptions of epidemic transmission and zero hospitalization cases in those counties exhibiting the lowest incidence rates among the present counties (Tables 1 and 2 ). This is an important outcome as it suggests that if testing and contact tracing were ramped up, then counties would have been able to lift their social distancing measures and hence reopen their economies by end of 2021 without fearing that such an option would have led to an overwhelming of their hospital capacities. We show that the most intensive social intervention modelled in this study, viz. phased lockdown release, maintenance of current social distancing measures and a 25% quarantine rate all maintained from October 1st to end of March 2021, was the only social option that would have not only prevented the occurrence of a 2nd wave, but also hasten the ending of the epidemic in all counties, with extinctions predicted to be possible as early as by November 1st in some low-medium incidence counties (Fig. 5) . However, like scenarios 4 and 5 but unlike the full release of social measures modelled in scenario 1, this scenario will also be marked by the low level of herd immunity that would develop in populations by the end of the local epidemics, leaving communities vulnerable to the real threat of future epidemic resurgences should the virus be re-introduced after the lifting of interventions (Fig. 6) . This finding indicates that either maintaining continued vigilance and control by testing and contact tracing measures will be required to counter this prospect of epidemic resurgence in these communities www.nature.com/scientificreports/ over the foreseeable future, or that ultimately, control of the epidemic would only be achieved through effective vaccination of county populations. Our results show that in the latter case, vaccination rates (with a highly effective vaccine) will need to be above 85% and even above 90% in the medium-high incidence counties (Fig. 6) to accomplish the resolution of the pandemic, although note that if significant population heterogeneity underlines virus transmission within a county, much lower rates (to as low as 50%) might be sufficient to arrest the local epidemic 6, 21 . This is provided that the developed immunity operates over the long-term. Our county-level forecasts also suggest that a spatially-tailored response would be more effective at minimizing harmful effects in communities, not just in relation to health outcomes, but also in terms of minimizing the disruption to the local and global economic and other social systems. Thus, we show that while combining social distancing measures to end of December 2020 with high intensity contact tracing and quarantine to March 2021 (scenario 5) could have depressed hospitalization cases within manageable levels across virtually all county incidence groups, it would have been possible to contain the pandemic in some low and medium incidence counties with a version of this scenario (scenario 4) that implements only low intensity quarantine (Fig. 4, Tables 1 and 2 ). This would allow reopening of the economies of these counties earlier than for high incidence counties, lessening the economic and other social disruptions faced by the populations of these counties. Similarly, our predictions of the impact of scenario 6, in which all interventions are implemented from October 1st 2020 onwards, indicate that resolutions of the epidemic would occur significantly earlier in low incidence counties than in the case of medium and high incidence counties, suggesting that a safe reopening of the state of Florida and indeed other US states could be effectively accomplished in a geographically phased manner that takes into account county-level variations in epidemic risk explicitly. Indeed, our web-based SEIRcast COVID-19 simulation tool (https:// seirc ast. org/) that implements our iterative data-driven continuous integration modelling framework, is designed to provide policymakers with the means to devise precisely such spatially-explicit management plans. We believe that including this spatial dimension into both models and in mitigation plans would not only provide for better predictions of the pandemic dynamics across a spatial domain, but would additionally result in significantly better overall social outcomes for state populations. www.nature.com/scientificreports/ While our findings imply that social measures in general are highly effective in containing and curbing COVID-19 transmission, further work to address the rapidly changing transmission conditions affecting the pandemic and emerging interventions will undoubtedly be required to extend the applicability of the present results. Perhaps a first need is to address what impact the advent of vaccines would have on the need for continuing with the social measures investigated to allow the safe reopening of parts of the populations in Florida as early as possible. The key question here is whether NPI strategies will need to continue and indeed must remain the mainstay of our attempts to contain the contagion even with the roll out of vaccinations. Indeed, if the present vaccines can only be delivered in a phased age-targeted manner, are not perfect but instead reduces susceptibility by a fraction, and if the immunity induced is not long-term or countered by virus mutants, then there is a need to investigate how best to adapt the social measures studied here along with vaccination to bring about the containment or resolution of the pandemic effectively 54 . We are currently extending our model to include these various vaccination scenarios to address this policy question. It is also clear going forward that we need to consider the effects that between-county movement might have on the current model predictions. While personal movement was curtailed drastically by lockdown, and the phased ending of the lockdown has led to increased movement within counties-both of which we have been able to incorporate into our model via parameterization of the within-county movement data provided by Unacast-details of inter-county movement and its reliable incorporation into our model will be required if we are to better capture the impacts of state-wide policies that are beginning to focus on lifting of all restrictions fully 55 . Recently, Unacast 52 has begun to publish population migration data in the US using cell-phone signals, which will provide a means to address this topic. www.nature.com/scientificreports/ Our current model also does not represent the age-structure and health status of the county-level populations. Partly this is an outcome of our goal to develop a modelling system that would support the generation of forecasts for the contagion in all counties of the United States based on the data presently publicly available for facilitating model configurations-and these currently lack information on these variables 47 . Extending our SEIR model to include these features, however, would allow better treatments of the exposure, risk, and transmission conditions that are likely to underlie the spatial heterogeneity in epidemic dynamics observed at the county level 18, 19, 31 . The addition of population structure and health composition into our current SEIR model will require deriving and adding more compartments and the applicable contact matrices 10,56,57 , but also, as noted, the configuration data for parameterizing these additions appropriately. We are currently in the process of adapting the data from the POLYMOD study 53, 57 to begin the construction of the relevant social contact matrices and parameterizations required for accomplishing these major extensions to the model. Nonetheless, it is to be noted that our dataassimilation approach to estimating the transmission rate in each country (both the median and range of values from the ensemble of best-fit models) implicitly does allow capture of the contributions of age-structural and other differences in transmission between counties, suggesting we have been able to approximate the impacts of this factor to a reasonable degree on the results presented here. Our sequential data-assimilation framework, while allowing the incorporation of longitudinal changes in transmission conditions into the model, has the outcome, as for all dynamical models, that prediction error will increase the further out of sample forecasts are made [22] [23] [24] [25] 27, 29 . While we have attempted to reduce forecast errors by model fitting to two sets of variables (infection cases and deaths), obtaining new data on other currently latent states (e.g. the fraction of asymptomatic infected cases) would offer better constraining of parameters and hence forecast variance. However, this must be balanced by appropriately addressing the effects of parameter degeneracy and sample impoverishment, which would impact the ability of the model to fit novel data as transmission conditions change drastically over the near future 37,58-60 . We have used a resampling approach whereby at each sequential updating point, we have blended in 25% random samples from initial priors to the posteriors obtained during the uptake made a time step (every 2-weeks) previously to keep forecast error below 20% to address this problem in the simulations reported here. However, future work might need to consider the development of appropriate adaptive approaches developed in the field of particle filtering 61 to resolve this problem more effectively. Regardless, we note that while our forecasts beyond 2 weeks ahead could attain variances as high as 40%, and so can affect the peak sizes and extinction dates reported here, this will have lesser impacts on the conclusions reached regarding the comparative outcomes of the interventions investigated in this study. Epidemic model. We simulated the ongoing SARS-CoV-2 outbreaks at the county level using a variation of the SEIR model. The model compartments and transitions are shown in Fig. 8 . Full equations are also given in Supplementary Material. We assume each county is a closed population and ignore demographic changes such that the total population size remains constant. The population is divided into compartments representing various infection stages: susceptible (S), susceptible but removed from the transmission process via lockdown policies (R 1 ), exposed (E), infectious asymptomatic (I Table S3 ). Note the model con- www.nature.com/scientificreports/ siders the fraction of the population classified in each compartment (all compartments sum to 1), which is then scaled to the appropriate county population size to get counts for each compartment. Data. To calibrate the model to the local county setting, we fitted the SEIR model sequentially (see below) to cumulative confirmed case and deaths data assembled from the start of the epidemic at the county level and published for public access by the Johns Hopkins University Coronavirus Resource Center 47 . The county population sizes are also made available via this database, which we use to scale the model predictions (see above). Hospital bed capacity in each county are provided by the Agency for Health Care Administration for the State of Florida accessed on April 5th 62 . A 7-day moving average is applied to the case and death data to smooth out testing irregularities. Estimation of initial epidemic growth rate. The initial incidence growth rate, τ, was estimated by fitting a log-linear model to the daily new cases reported during the early exponential phase (the first 4 weeks generally) of the epidemic curve observed in each county 63 . The values estimated for τ in each county were used to stratify the counties in Florida into each of initially low (< 0.05), medium (> 0.05 to < 0.15) and high (> 0.15) incidence or epidemic groups. Bayesian melding data assimilation. We used a Monte-Carlo-based Bayesian melding framework to undertake the sequential updating of the model to the cumulative case and death data 40, 64, 65 . We began by first defining uniform prior distributions for each of the model parameters based on current understanding of SARS-CoV-2 transmission and disease characteristics. These initial parameter priors and relevant references are given in Supplementary Table S3 . Note that the number of initial infected cases at the start of the simulation period is sampled as the parameter E0 ( Supplementary Fig. S4 ), the number of exposed cases when the first cases began to be confirmed. We consider the start of the epidemic in each county to be when there are at least 10 cases reported. At this point, we sampled N = 50,000 parameter vectors from the initial priors and simulate the outbreak for 14 days forward. The resulting 50,000 model predictions of the epidemic are then compared to the confirmed case and death data observed during the 14-day forecast period using a modified root-mean-square error distance metric that normalizes a traditional RMSE by the standard deviations of these data. This facilitated the combination of prediction errors with respect to case and death data together despite their different orders of magnitude: where n is the number of time points over which to compare the model predictions to data, ŷi is the modelpredicted confirmed case data on a given date i, and y i is the observed confirmed case count for the same date, xî is the model-predicted death data on a given date i, and x i is the observed death count for the same date. Based on this performance metric, the best-fitting 500 parameter vectors are retained as the most likely parameter sets to describe the local outbreak during the chosen 14-day window. For simulating the epidemic for the next 14 day period, another 50,000 parameters sets are sampled of which 75% are randomly sampled from the posterior distribution of the most recent 14 day window, while another 25% are sampled from the initial parameter priors to avoid sample depletion 59, 60 . These set of blended parameter vectors are used to sequentially select the best-fitting models over time, and are used to forecast the impacts of the interventions described above. Different fitting windows were tried, and a 14-day window was found to be long enough to be computationally feasible for the entire dataset for all counties, while being short enough to capture the changing epidemic behavior and keep forecast error consistently low (below 20%). The best-fitting parameter vectors are used to simulate future scenarios. The forecasts allow for the prediction of future waves of infection, which are defined as a sustained positive growth rate of cases, leading to a maximum. The subtleties of identifying the exact timing of waves due to trivial oscillations in data has been explored in other works 66 . Simulating interventions. We used the latest sequentially fitted model in each county to simulate the impacts of different social intervention scenarios on the course of the outbreak in the future (beyond October 1st 2020). We simulated six different scenarios, which are outlined graphically in Supplementary Fig. S3 . Scenario 1 represents the least aggressive option where lockdown and social distancing measures (like modified behavior, physical distancing, mask wearing, and increased sanitization) are fully lifted after September 30th. Scenario 2 maintains lockdown in addition to keeping social distancing measures in place for 2 weeks from October 1st to October 14th. We consider scenarios 1 and 2 to mimic the State of Florida's state reopening plan (https:// flori dahea lthco vid19. gov/ plan-for-flori das-recov ery/). Scenario 3 extends the social interventions (lockdown plus social distancing measures) by maintaining it over a longer 8 week period to November 30th. Scenarios 4 and 5 represent maintaining current social distancing and movement restrictions through the end of the year (December 2020) in addition to implementing contact tracing and quarantine efforts at either low (q = 0.25) or high (q = 0.50) intensity, respectively, from October 1st to end of March 2021. Finally, Scenario 6 represents the most intense intervention scenario, viz. maintaining social distancing, lockdown, and low quarantine starting from October 1st through to the end of March 2021. We considered this to be the most intense intervention because we maintain all three social interventions for the longest period of time. We implemented a low quarantine effort in this scenario to represent an incremental increase in the intensity of interventions relative to Scenario 5. This allowed us to also investigate if this was sufficient given the longer period of interventions to have the biggest www.nature.com/scientificreports/ impact on the pandemic among all the scenarios investigated in this paper. Outbreaks were simulated under these conditions and the predicted number of cases are compared between each scenario and county group. The numbers of hospitalized individuals (I H + I C ) are also forecasted to evaluate the potential resource needs under each scenario. The effect of statewide lockdown measures in these simulations is implemented by adding a distinct susceptible class which is assumed to not contribute to disease transmission (R 1 ). The proportion of this class that complies with strict stay-at-home orders is controlled through the ratio of parameters α and λ. This ratio is informed by the fraction of non-essential trips made by the population in each county as estimated by Unacast based on analyses of GPS mobility data 52 . Mobility data has been used as a predictive tool across many domainsit has been used to build mobility networks 67 , study the relationship between mobility and financial market performance 68 , and understand which mobility patterns lead to an increased risk of death due to COVID-19 infection 69 . We interpret the reduction in such non-essential trips from prior to the lockdown as a proxy for the proportion of the susceptible population remaining unexposed in a county during any time of a simulation-both during the lockdown and after lockdown measures were lifted. Note that, additionally, incorporating this fraction of protected susceptibles into the model by this means also allows us to address the question of self-isolation by individuals indirectly. Social distancing measures are modeled as a reduction in transmissibility of the pathogen through the parameter d and is primarily based on the effectiveness of masks against transmission of similar diseases 70 . Although the parameter d could capture the effect of lockdown in addition to social distancing measures, we decided to model the population under lockdown separately via the use of the independent Unacast data to retain our ability to undertake investigations, if required, of the relative impacts of these measures on the course of the pandemic in future simulations. Quarantine of infectious cases through contact tracing and/ or testing is modeled simply as a proportion, q, of I A , I P , and I M as not contributing to transmission as a result of being detected and made to isolate themselves at home. Wakulla 9333 Covid-19 and community mitigation strategies in a pandemic All hands on deck: A synchronized whole-of-world approach for COVID-19 mitigation Beyond just "flattening the curve": Optimal control of epidemics with purely non-pharmaceutical interventions The lockdowns worked-but what comes next? Estimating the seroprevalence of SARS-CoV-2 infections: Systematic review COVID-19 herd immunity: Where are we? Dynamic interventions to control COVID-19 pandemic: A multivariate prediction modelling study comparing 16 worldwide countries Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: A modelling study First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: A modelling impact assessment Epidemic analysis of COVID-19 in China by dynamical modeling Novel coronavirus 2019-nCoV: Early estimation of epidemiological parameters and epidemic predictions Why is it difficult to accurately predict the COVID-19 epidemic? Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling study Forecasting COVID-19 COUnty aggRegation mixup AuGmEntation (COURAGE) COVID-19 prediction An open-data-driven agent-based model to simulate infectious disease outbreaks Using data-driven agent-based models for forecasting emerging infectious diseases Transmission dynamics reveal the impracticality of COVID-19 herd immunity strategies A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2 Environmental Modelling: An Uncertain Future? Prediction in ecology: A first-principles framework Iterative near-term ecological forecasting: Needs, opportunities, and challenges The model-data fusion pitfall: Assuming certainty in an uncertain world The role of data assimilation in predictive ecology Developing an automated iterative near-term forecasting system for an ecological study Ecological forecasting and data assimilation in a data-rich era Short-term forecast validation of six models Metapopulation network models for understanding, predicting, and managing the coronavirus disease COVID-19 Spatially explicit models for exploring COVID-19 lockdown strategies Understanding spatial propagation using metric geometry with application to the spread of COVID-19 in the United States Identifying US County-level characteristics associated with high COVID-19 burden INDEMICS: An interactive high-performance computing framework for data-intensive epidemic modeling FluTE, a publicly available stochastic influenza epidemic simulation model Recent advances in computational epidemiology A sequential Monte Carlo approach for marine ecological prediction On-demand data assimilation of large-scale spatial temporal systems using sequential Monte Carlo methods Continental-scale, data-driven predictive assessment of eliminating the vector-borne disease, lymphatic filariasis, in sub-Saharan Africa by 2020 Inference for deterministic simulation models: The Bayesian melding approach Bayesian calibration of simulation models for supporting management of the elimination of the macroparasitic disease, Lymphatic Filariasis. Parasites Vectors 8, 522 Sequential Monte Carlo without likelihoods Disease transmission models for public health decision making: Toward an approach for designing intervention strategies for Schistosomiasis japonica Automated data-intensive forecasting of plant phenology throughout the United States Reproducibility of computational workflows is automated using continuous analysis A semantic framework for modeling and simulation of cyber-physical systems An interactive web-based dashboard to track COVID-19 in real time Combining computational models, semantic annotations and simulation experiments in a graph database Docker: Lightweight linux containers for consistent development and deployment Sequential data assimilation: Information fusion of a numerical simulation and large scale observation data Best practices for computational science: Software infrastructure and environments for reproducible and extensible research Social distancing scoreboard SOCRATES: An online tool leveraging a social contact data sharing initiative to assess mitigation strategies for COVID-19 Will an imperfect vaccine curtail the COVID-19 pandemic in the Association between mobility patterns and COVID-19 transmission in the USA: A mathematical modelling study A multi-group SEIRA model for the spread of COVID-19 among heterogeneous populations Social contacts and mixing patterns relevant to the spread of infectious diseases On sequential Monte Carlo sampling methods for Bayesian filtering Particle filters and data assimilation Proceedings of the Conference on Summer Computer Simulation 1-10 (Society for Computer Simulation International Florida Agency for Health Care Administration Outbreak analytics: A developing data science for informing the response to emerging pathogens Geographic and ecologic heterogeneity in elimination thresholds for the major vector-borne helminthic disease, lymphatic filariasis Modelling Parasite Transmission and Control COVID-19 in the United States: Trajectories and second surge behavior Mobility network models of COVID-19 explain inequities and inform reopening Efficiency of communities and financial markets during the 2020 pandemic Stay-at-home works to fight against COVID-19: International evidence from Google mobility data The effect of mask use on the spread of influenza during a pandemic All data analyzed during this study are included in this published article and its supplementary information files. Code for sequentially calibrating and running the SEIR model is available at https:// github. com/ Edwin Micha elLab/ COVID-SEIR-Paper. This work was made possible by an internal grant from the University of South Florida. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. A portion of the model runs was carried out using the MATLAB Parallel Computing Toolbox made available on Compute Clusters of the University of Notre Dame's Center for Research Computation. The authors are also grateful to one of the reviewers for comments on the modified RMSE used in this paper. The authors declare no competing interests. The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-022-04899-4.Correspondence and requests for materials should be addressed to E.M.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.