key: cord-0255340-r34oq460
authors: Jain, S.; Tiwari, A.; Bannur, N.; Deva, A.; Shingi, S.; Shah, V.; Kulkarni, M.; Deka, N.; Ramaswami, K.; Khare, V.; Maheshwari, H.; Dhavala, S.; Sreedharan, J.; White, J.; Merugu, S.; Raval, A.
title: A Flexible Data-Driven Framework for COVID-19 Case Forecasting Deployed in a Developing-world Public Health Setting
date: 2021-11-10
journal: nan
DOI: 10.1101/2021.11.01.21260020
sha: 4a73c15afed044972e6f8d1fd0a316c2e3b6cde9
doc_id: 255340
cord_uid: r34oq460

Forecasting infection case counts and estimating accurate epidemiological parameters are critical components of managing the response to a pandemic. This paper describes a modular, extensible framework for a COVID-19 forecasting system, primarily deployed in Mumbai and Jharkhand, India. We employ a variant of the SEIR compartmental model motivated by the nature of the available data and operational constraints. We estimate best-fit parameters using sequential Model-Based Optimization (SMBO) and describe the use of a novel, fast, and approximate Bayesian model averaging method (ABMA) for parameter uncertainty estimation that compares well with a more rigorous Markov Chain Monte Carlo (MCMC) approach in practice. We address on-the-ground deployment challenges such as spikes in the reported input data using a novel weighted smooth-ing method. We describe extensive empirical analyses to evaluate the accuracy of our method on ground truth as well as against other state-of-the-art approaches. Finally, we outline deployment lessons and describe how inferred model parameters were used by government partners to interpret the state of the epidemic and how model forecasts were used to estimate staffing and planning needs essential for addressing COVID-19 hospital burden.

The ongoing COVID-19 pandemic has spurred intense interest in epidemiological forecasting models. The need for robust solutions has been especially pressing in dense populations across the developing world, with their limited health resource availability and long lead times for addressing shortfalls.

To reduce mortality, it is important to ensure adequate capacity availability of critical health care resources. There is a need for forecasting reported infections at a local level to inform capacity planning, model the effects of policy changes and prepare for potential scenarios. In this paper, we describe a deployed forecasting framework that was used in Mumbai, India, a densely populated city, as well as in other resource-constrained regions such as the state of Jharkhand, India, during the first Covid infection wave. Partners for data and usage of the solution were the Brihanmumbai Municipal Corporation and the Integrated Disease Surveillance Programme, respectively, at the two locations.

Successful deployment of epidemiological models requires addressing operational challenges inherent in the public health landscape.

Data limitations. Data collection and management activities during a pandemic are highly impacted due to severe demands on public health authorities, leading to data discrepancies. In the face of these data challenges, it is important to choose model families whose complexity matches that of the available data.

Need for rich insights and what-if-scenarios. The choice and implementation of correct policy is often predicated not just on expected case counts but also on changes in underlying epidemiological parameters. It is therefore important to construct models that are highly interpretable with independently verifiable parameters.

Dynamic user requirements. Forecasting needs are highly dynamic as the epidemic progresses. In the early stages, when containment is critical, accurate regional forecasts for short horizons are valuable. In later stages, the emphasis shifts to estimation of hospital burden and to uncertainty around those projections. A robust yet flexible modeling framework that can support multiple application-specific needs is thus essential.

These challenges motivate the need for a forecasting framework that can support data-driven parameter estimation and uncertainty quantification for interpretable models. Such models cannot be directly learned using gradient-based learning approaches. Further, variations in data semantics and user requirements necessitate a solution approach that is agnostic to the choice of model class and application-specific prediction quality criteria.

The general epidemiological model parameter estimation problem can be stated as follows. For a given spatial region, let [ ] denote a multi-variate time series of case counts corresponding to different demographics groups and disease stages. Let be a black-box epidemiological forecasting model parameterized by ∈ ⊆ IR k that generates a future forecast from historical observations. Given a time series loss function ℒ, the objective is to determine the optimal parameters * such that the loss between the predicted and observed data over a specified horizon [ , ] is minimized:

whereˆ[ : ] = ( [0 : − 1]). To estimate parameter uncertainty, it is also necessary to generate the resulting forecast distribution over the time horizon. Deployment scenario. In our deployment scenario, the primary data comprised counts of confirmed, active, recovered, and deceased cases. For certain regions, severity-stratified counts and testing data were also available. Reliable counts of in-flow and out-flow cases and within-region mobility indicators were rarely available. While public health partner requirements changed over time, they were largely focused on hospital burden estimation over a horizon of 30-45 days. The main need was to ensure adequate medical resources while minimizing excess capacity that would remain under-utilized. A readily interpretable relative accuracy criterion such as Mean Absolute Percentage Error (MAPE) in primary case count forecasts was well-aligned with our partner needs, but the relative importance of the four different case counts could vary.

We built a multi-purpose epidemiological forecasting framework with public health requirements in mind. This system was used to drive decision-making and planning in Mumbai and Jharkhand, India during the first Covid wave. Our framework consumes aggregate case count data culled by government officials from health facilities, and outputs burden estimates that were used by relevant authorities for subsequent planning of personnel and supplies. This paper discusses the main elements of our forecasting framework and makes the following contributions.

• System and Process Design. [Section 3]. We describe a practical, modular, and extensible learning-based epidemic case forecasting system that is customizable to individual regions and application scenarios. The system consists of modules for data ingestion, preprocessing and exploratory analysis, model fitting, scenarioconditioned forecasting, and application-specific report generation.

• Modeling Methodology. [Sections 4 and 5]. We present techniques for model and loss-agnostic estimation of parameters via sequential model based optimization (SMBO) [7] . Combining SMBO sampling with Bayesian model averaging enables fast approximate quantification of forecast uncertainty. We demonstrate this is empirically comparable to a more rigorous Markov Chain Monte Carlo (MCMC) approach but computationally faster. We develop smoothing methods to handle data issues arising out of reporting delays.

• Epidemiological Model Choice. [Section 4]. We argue that conditions of interpretability and identifiability warrant a simple variant of the SEIR compartmental model especially when only confirmed, active, recovered and deceased case counts are observed.

• Empirical Results. [Section 5]. We present extensive empirical analyses detailing the optimization of relevant hyperparameters, field predictive performance for the city of Mumbai, and comparison with other state-of-the-art models hosted by ReichLab [1] .

• Deployment Lessons. [Section 6]. We summarize the key lessons from deployment of our modeling framework in Mumbai and Jharkhand. The audience for this work consists of applied researchers working on practical forecasting and public health officials.

The work presented here builds on four areas of epidemiological modeling: a) forecasting, b) parameter estimation (with uncertainty), c) modeling with data limitations, and d) public health deployment.

Epidemiological Forecasting. The COVID-19 pandemic and associated global forecasting challenges [1, 24] have spurred new research on modeling infectious disease spread. There are three broad classes of models. (a) Compartmental models assume that individuals in a population at any given time are assigned to one of several states, called compartments, and move between these compartments. Examples include the SIR [25] and SEIR [22] models. Over the last year, variants of the SEIR model have been widely used to study the COVID-19 pandemic project hospital burden [28, 36, 37] , incorporating aspects such as age-stratification [2, 20] , asymptomatic transmission [34] , and effects of social distancing measures [8, 16] . (b) Agent-based models [17, 38] simulate interactions and disease stage transitions of individual agents. (c) Curve-fitting models fit parameterized curves to data. Examples include the exponential growth model and the IHME CurveFit model [21] .

Model Parameters and Uncertainty Estimation. Multiple works address the problem of epidemiological parameter estimation with forecast uncertainty [9, 11, 12, 15] , but these techniques are usually specific to the model class and involve assumptions on the generative process. Recently, there has been increased focus on the practical aspects of model fitting such as choice of training duration [30] and identifiability issues [27, 32] . Model-agnostic evaluation of forecasts is another related area of interest [1, 19, 35] . While our deployment was based on compartmental models, the proposed techniques are model-agnostic and can be adapted to any application-specific loss function.

Modeling with Data Limitations. Epidemiological modeling in the developing world is plagued with data paucity and quality issues. Various studies have focused on understanding transmission dynamics in such limited data settings [4, 26, 33] . In our current work, we discuss some of these challenges and possible resolution via appropriate model choices and data preprocessing.

Public Health Deployment. Practical use of epidemiological models in public health response requires a holistic view of government . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint Figure 1 : COVID forecasting framework. The three-phase pipeline consists of a) data operations: working with government officials to consolidate and store data collected from various sources throughout the city, b) a modular modelling framework for fitting and forecasting, and c) report generation of customized forecasts for clients. The specific components for deployment with government partners are highlighted in orange and connected by a green line delineating the workflow.

priorities, policy levers, and processes. Work in economic epidemiology [10, 31] is targeted towards supporting decision-making related to interventions and policy choices. Multiple organizations [1, 29] currently share automated COVID case forecasts with relevant public health authorities, but the forecasts are not always customized for decision-making. Our deployment involved a two-way partnership with the government, providing precise capacity planning guidance and insights on the pandemic dynamics per requirements.

This section provides an overview of our end-to-end epidemic forecasting system, as visualized in Figure 1 .

Partnership with government and official data access was enabled in the early days of the pandemic by providing proof-of-concept forecasts that initially used case counts from public sources such as Covid19India [13] and JHU [18] . Data schema for aggregate case counts was defined to facilitate automated forecasting. However, realisation of this schema across upstream providers proved to be a challenge. Issues faced included those described in the Introduction: duplicate records, point-ofentry data entry errors, inconsistencies, and errors in reporting transitions between care levels and disease stages. We therefore developed point-of-entry validation, along with partial automation of data consolidation and de-duplication. This semi-automated process was aided by sharing spreadsheets with partners on our cloud storage. Our pipeline performed quality checks and anonymized sensitive data before exposing it to the modelling team via an SQLcompliant database. 1 We also maintained descriptive data visualizations to monitor abnormal patterns. This pipeline was later replicated across partners.

In order to flexibly support multiple forms of case count data, epidemiological models, and loss functions, our modelling framework has a modular design with five high-level components: data loaders, models, learners, loss functions, and publishers, 2 each suited to multivariate time series forecasting. This extensible and scalable framework allows rapid experimentation with different modelling techniques and data sources of varying complexity, as well as a platform for comparing results. Examples of these components are shown in Figure 1 . Modelling details are discussed in Section 4.

The content and format of forecast presentations varied depending on the audience. The solution framework allowed for model outputs to be consumed in a variety of ways: as time-series forecasts, planning reports (for public authorities) or data pushes (for submissions to ReichLab (Section 5.5)).

Forecasts were either routine (for monthly planning) or ad hoc (to address rapidly evolving scenarios). Forecast requests, compiled by team members physically located in COVID 'war rooms', included information on recent changes in mobility, expected changes in policy over the planning horizon and notes on data -to appropriately define plausible parameter ranges. Requests were used to generate forecasts and organized into deciles of the forecast distribution (anchored on a future date determined by the planning horizon). Specific scenarios (mapped to deciles of the forecast distribution) were selected to represent an appropriate 'planning scenario', a 'low case scenario' and a 'high case scenario'. These interactions not only allowed public authorities to understand the state of the pandemic through changes in model parameters, but also led to the consideration of information unavailable to the model (expected changes to testing strategy, treatment policy, and non-pharmaceutical interventions such as shelter-in-place orders).

In the capacity computation module, projected active case numbers and trends in utilization (occupancy rates and length of stay) were used to estimate facility-level demand for personnel, quarantine beds, oxygen-supported beds and intensive care units. These estimates were distilled into final recommendations consisting of projected cases for a 30-45 day horizon, potential shortfalls, and recommended increases to capacity, based on resource capacity proportions shared by government partners (see Figure 12 ).

This section describes our approach to parameter estimation and uncertainty quantification. We also discuss smoothing strategies . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.01.21260020 doi: medRxiv preprint for handling data spikes and the choice of epidemiological model used in the COVID-19 engagement.

Given the nonlinear nature of the optimisation problem of Equation 1, we employ a Sequential Model-Based Optimization (SMBO) method to explore the searchspace of parameters. We use the Expected Improvement (EI) criterion [23] to return the best parameter set at each iteration and the Tree-structured Parzen Estimator (TPE) [6] to model the loss. We use the Hyperopt implementation of TPE [7] . Note that this black box optimization method is modelagnostic and applies to any zeroth order optimization problem.

We estimate parameter uncertainty by approximating the posterior distribution of the parameters given the data. Given a dataset and loss function ℒ( ) as defined in Equation 1, we assume that the likelihood function is tuned to the loss function as

where is a concentration parameter to be estimated as described below. When the loss function ℒ(·) is a regular Bregman divergence, the distribution corresponds to a uniquely determined exponential family [3, 5] . Choosing the parameter prior ( ) to be from the conjugate prior family of the resulting exponential family yields a posterior distribution of the form

where ℎ( ) is uniquely determined by the loss function [14] . Expected values under the posterior distribution can be computed by first sampling from the prior ( ) with appropriate reweighting and then computing importance sampling estimates. Thus, given parameter samples { } =1 from the reweighted prior, and the corresponding loss function value ℒ( ) for each sample, the expected value of any function ( ) is estimated as

Approximate Bayesian Model Averaging. A limitation of this approach is that importance sampling with a broad prior as proposal distribution may require a prohibitively large number of samples. We therefore considered an approximation, hereafter referred to as Approximate Bayesian Model Averaging (ABMA), where we apply Equation 4 not to samples from the prior, but rather to the sequential samples obtained through SMBO. The intuition behind doing so is that these samples represent regions close to the optimal parameter values where the loss is likely to be small, and hence should capture the dominant behavior in sample averages like Equation 4. Additionally, the SMBO implementation is computationally efficient. In Section 5, we show via comparison with an MCMC approach that this is a reasonable approximation. The term ensemble mean, used throughout this paper in the context of parameter and forecast averaging, refers to taking the sample means according to Equation 4 .

In the application of ABMA, it remains to fix the values of and . We estimate by minimizing the ensemble mean forecast and the ground truth on a validation time interval. To fix , we just choose a value of large enough such that the mean and variance of the parameter samples, computed through application of Equation 4, converges.

MCMC-based Estimation of Parameter Uncertainty. A more rigorous method to sample parameter values is via Markov Chain Monte Carlo (MCMC) sampling from the posterior distribution of the parameters given the data, i.e., P( | ). Assuming an appropriate prior over and a likelihood function ( | , ), where denotes additional parameters in the likelihood function (analogous to in Equation 2), we may consider the larger problem of sampling from the extended posterior distribution

where we assume and have independent priors. This larger sampling problem is solved by a combination of MCMC and Gibbs sampling steps, using a truncated Normal distribution (·) (truncated at the boundaries of the search space) as a proposal distribution for MCMC moves. We choose an initial 0 , 0 , and then employ the following Metropolis-within-Gibbs procedure to alternately sample from the posterior distributions ( | , ) and ( | , ). 1) MCMC sampling with target distribution ( | , ). Given samples −1 , −1 from iteration −1, sample a new according to the Metropolis criterion, using the proposal distribution ( | −1 ).

2) Sampling from the posterior distribution ( | , ). Given a sampled in the previous step, sample ∼ ( | , ). This sampling can be made straightforward by choosing a prior for that is conjugate to the likelihood distribution.

The steps above together constitute the Gibbs sampling updates, and are repeated alternately until convergence of the Markov chain. This eventually results in a sample from the extended distribution ( , | ).

Given an ensemble of ABMA parameter samples { }, one can compute forecast trajectories for each parameter set in the sample. For a forecast trajectoryˆ( ), the effective probability density function (pdf) is proportional to − ℒ ( ) . The forecast quantiles on a given day * are obtained by sorting the forecasted case counts corresponding to different parameter sets on day * and using the cumulative distribution function (CDF) to find the quantile of interest. This procedure has the desirable property that a trajectory at a fixed quantile value corresponds to a single parameter set. However, forecast trajectories for different parameter sets can cross in time. Therefore quantiles computed at one time point are generally not preserved in time.

A work-around is to re-estimate quantiles on each day and piece together a "fixed quantile" trajectory using the forecasted values for that quantile on each day. A trajectory so constructed, however, does not correspond to a single set of epidemiological parameters and is therefore difficult to interpret. We use the former method for the sake of interpretability in our deployment and specify the * at which quantiles are computed, while the latter method is used for "fixed quantile" forecast comparisons.

One of the practical issues we faced was spikes in case count data, originating in delayed reporting of test results and tracking of recoveries and deaths. This leads to accumulated occurrences from . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. the past being reported on a single day. Such spikes therefore need to be smoothed by distributing them several days into the past, even though actual attributed dates were unavailable. We implemented a smoothing technique that carries out this redistribution under some assumptions. Consider a spike on day in cumulative case counts , given by Δ = [ ] − [ −1 ] that can be attributed to delayed activity since day . We construct a smoothed time series by redistributing Δ as follows: Note that a spike in one compartment could correspond to a compensatory dip or a spike in a different compartment such that the total case count stays constant. For example, a spike in recoveries must be compensated by a dip in active cases, since recoveries deplete the active cases. Thus, over the duration of smoothing, the active cases are adjusted by additional recoveries, resulting in a smoothed version of active cases. The model is then fitted on the smoothed case time series rather than on the raw counts.

Model Choice Criteria. The choice of epidemiological model was driven by four main criteria: expressivity, or the ability to faithfully capture the disease dynamics; learnability of parameters conditioned on the available data; interpretability in order to understand the evolution of the pandemic; and generalizability to future scenarios by incorporating additional information. We observed that simple exponential growth based models that do not account for decreasing susceptibility and finite populations became inaccurate as the pandemic progressed. Agent-based models [17, 38] , on the other hand, are overparameterized, with a low level of learnability. While curve-fit models and compartment models had similar levels of accuracy, the parameters of the latter were readily interpretable; 3 compartmental models, therefore, represented a natural choice. SEIARD Model. We chose a simple, minimal complexity compartmental model that only included compartments for primary case counts or those essential for expressivity. Accessible case count data typically comprised cumulative counts of confirmed cases ( ), active cases ( ), recovered cases ( ), and deceased cases ( ) with = + + . We assumed confirmed cases to be post-infectious owing to strict quarantining measures. We explicitly modeled the latent "incubation" and "infectious" stages that occur prior to detection so as to allow what-if scenario modeling with changes in isolation and testing policies. Figure 2 depicts the structure of this SEIARD model (see Supplement for the equations). The population is split into compartments [S, E, I, A recov , A fatal , R, D]. The states S (Susceptible), E (Exposed), and I (Infectious) and the associated parameters (transmission rate , incubation period inc = −1 , and the infectious period inf = −1 ) are defined as in the classic SEIR model [22] . An individual who tests positive moves to a post-infectious but active phase, which is split into A fatal and A recov based on the eventual outcome: fatality (D) with probability fatal or recovery (R). fatal and recov are the mean durations from detection to eventual death or recovery respectively. Observed case count data is mapped to compartment populations as follows: = |R|, = |D|, = |A recov | + |A fatal |.

Interpretable parameters such as reproduction number R 0 = / , recov , and fatal provid an understanding of the pandemic evolution and reporting gaps, and allow scenario-conditioned forecasting. Changes in the recovery policy and upcoming events such as festivals can be simulated by appropriate adjustment of recov and .

Parameter Fitting Considerations. Estimating the parameters of this model from case counts posed challenges: a) Parameter drift due to changes in social distancing policy necessitated refitting the model to a recent time window for every forecast round, b) Initial values of the latent compartments 4 E and I could not be set to 0 because the fitting time interval could occur at an intermediate stage. We treat these latent variables as additional parameters to 4 is not viewed as latent since the net population can be assumed to be nearly constant.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted November 10, 2021. and [ ] to be free parameters whose fitted values adjust to accommodate the effect of undetected cases. d) Identifiability of all key parameters was addressed by incorporating existing domain knowledge about inc , inf , and fatal from raw (possibly incomplete) case line lists, when available ( Figure 3 ).

To demonstrate the value of the proposed framework, we report (a) empirical results validating choices of hyperparameters, uncertainty estimation, and data preprocessing, (b) field performance and impact of the deployed system in Mumbai, and (c) empirical comparison of our approach with other state-of-the-art (SOTA) models in Reich Lab on COVID-19 case data from the US. The main elements of our experiment setup are:

Data. We consider time series of confirmed ( ), active ( ), reported ( ), and deceased ( ) cases. Results shown are based on real data from the city of Mumbai except for the hyperparameter optimization discussion (Covid19India data) and the comparison with Reich Lab models (JHU data on 45 US regions).

Algorithms. Various alternatives within our proposed framework (ABMA vs. MCMC, smoothing variants) as well as 26 different models from Reich Lab were considered.

Evaluation Metric. Mean Absolute Percentage Error (MAPE) along each of the four variables ( , , , ) and aggregated over the compartments with weights is used as the primary metric for evaluation and comparisons. Uncertainty estimation is discussed in terms of credible intervals.

Our experiments indicated that smoothing (in particular, the Proportional count variant) leads to better prediction of future cases. We evaluated the smoothing variants discussed in Section 4.3 on simulated incident (new) raw data inc [ ] with a simulated spike caused due to delays in data reporting between dates and . Following on-the-ground reporting discrepancies, we assume that only a fraction of the true incident cases inc (2, 2) ). Other parameters are chosen according to Table 4 .

Smoothing methods are then applied to the simulated data generated after the dependent case counts are adjusted to sum to the confirmed cases. Table 1 and Figure 4 show the results of smoothing experiments. Overall, we found that the "Proportional counts" smoothing method works best at reconstructing the raw data with reasonable accuracy. 

Within our modeling framework, the following hyperparameters need to be tuned to achieve good performance: a) , the length of the fitting period in days to estimate model parameters , b) , the number of days of data used to tune the concentration parameter , and c) , the number of trials used to optimise . In addition, it was also necessary to identify a suitable loss function and compare the performance of the TPE to other fitting techniques. Details of these tuning experiments are given in the Supplement.

We compared the ABMA method of Section 4.2 to the MCMC sampling approach described there, along three dimensions: a) comparison of the lowest loss values achieved, b) comparison of case count forecasts, and c) distributional comparison of the sampled parameters. For these comparison experiments we chose a duration of 28 days for fitting, from 1 October to 28 October 2020, and a duration of 21 days, from 1 November to 21 November 2020, for evaluation. The three days in between were used for fitting in the ABMA procedure. The time taken to generate MCMC samples was noticeably higher than that for ABMA: under a controlled compute setting, ABMA was found to be an order of magnitude . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Comparison of the best loss values achieved. We evaluated the MAPE losses across forecasted ensemble mean time seriesˆon the test duration, corresponding to parameters sampled by ABMA vs. MCMC algorithms ( Figure 5 ). Note that for MCMC samples, the ensemble mean forecasts are generated by simply averaging the forecasts for the sampled parameter sets, while for ABMA samples, the ensemble mean forecasts are generated by weighting each forecast by its importance sampling factor. The figure illustrates that the differences between the MAPE loss values are minor when comparing both ensemble means on the test set.

Comparison of case count forecasts. We now turn to a comparison of the forecasts themselves. Figure 6 shows comparisons of ensemble mean forecasted values, and their 95% credible intervals for the four compartments studied which were generated using quantiles recomputed daily. We find that mean values and intervals agree quite well for all compartments except active cases. Also, for active cases, interestingly the true case count is observed to fluctuate between the MCMC and ABMA values. 

The ABMA framework was used to provide actionable insights for the city of Mumbai, India from May 2020 to October 2020. It is also currently being used to forecast hospital burden in the state of Jharkhand, India. Figure 7 , showing the ensemble mean forecast against the ground truth, and Table 2 , showing the estimated parameters and loss values, confirm the high accuracy and interpretability of our methodology across different phases of the first wave of the pandemic.

Recommendations made using this forecasting framework helped increase Mumbai's ICU bed capacity by over 1200 units, with 95% utilization of ICUs at when hospitalizations peaked. Moreover, over the deployment period, no absolute shortfall of critical health care resources became apparent.

We used the ReichLab forecasting hub [1] for US states as a basis for comparing the performance of our forecasting framework with other methods ( Table 3 ). The source of data for the forecasts is the JHU CSSE data [18] . We evaluated our method only on regions for which all four primary case counts were available (44 states plus Washington D.C.). Parameter and hyperparameter choices for the evaluation are in the Supplement ( Table 4 ). The basis for comparison with other models is the MAPE value on deceased case counts.

We found that 26 models submitted to ReichLab had submissions for at least 45 regions over the duration studied, with a range of median MAPE values between 1.21% and 4.26%. The top 10 models by median MAPE are shown in Table 3 . Our forecasts have low error and compare well to other models without the need for any region-specific customisation of hyperparameters or fitting method.

We further analyzed the distribution of errors across regions ( Figure 8 ) and assessed performance relative to other modeling approaches for each US region using normalized MAPE defined as

where L( ) is the MAPE loss for model , M is the set of all models, MED and MAD denote the Median and the Median Absolute Deviation operators respectively. The variation in relative model performance across states could be attributed to changes in social distancing policies and the consequent variations in disease spread dynamics.

Dynamic data entails human in the loop. The data-related challenges mentioned in the Introduction led to us relying heavily on humans in the loop for tracking evolving data definitions and carrying out semi-automated quality checks. Data versioning and pre-processing prior to modeling (Section 5.1) were also essential.

Model interpretability and identifiability is paramount. Our choice of the SEIARD model (Section 4.4) over other model families was motivated by planning needs and policy choices. Therefore the parameters had to be interpretable, independently verifiable where possible, and robustly estimable from the available data.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.01.21260020 doi: medRxiv preprint Application needs should dictate modeling choices. Focus on capacity planning had a tangible meaning for the horizon of interest: policies around capacity took about a month to implement. We customized our model fitting with loss computed over this time horizon. When provided with information on upcoming policy changes and events-festivals, for example-we adapted the models accordingly to generate accurate forecasts.

Insights must be actionable. Model insights had to be translated to concrete action guidance to enable smooth planning. Uncertainty estimation (Section 4.2) allowed us to provide three relevant scenarios: a) a planning b) a high case, c) a low case scenario. Region-specific testing levels, evolving severity of cases and sero-surveillance information to understand the state of the pandemic were important factors informing the selection of planning scenarios.

Capacity gaps at the last mile are hard to anticipate. Although there were no apparent capacity shortfalls at a city level, the ability of a critical patient to access these resources is mediated by granular factors such as access to information on the availability of beds, local emergency transportation, ability to pay, and other equity considerations, which need to be factored in.

We have presented a flexible modeling framework and demonstrated its value for epidemic forecasting using the kind of case count aggregate data that is typically available in a constrained public health setting. The deployed system was used to drive decisionmaking and planning with good accuracy (worst case MAPE < 20%) during the COVID-19 pandemic in Mumbai and Jharkhand, India. Our framework enables rapid forecasting with uncertainty estimates, and is extensible to other model families and to different loss functions. It also enables the optimization of hyperparameters such as fitting durations and ensemble weights. We motivated the choice of the specific compartmental model used via identifiability of the underlying parameters given the data constraints. Empirical comparison of our methods with other advanced models in the ReichLab hub on real-world data further points to their efficacy.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In future work, we plan to open-source this framework, create a playbook around it, and apply it to the estimation of case burden in other infectious diseases, such as Tuberculosis, which are widespread across the developing world. One of the limitations of the current framework that we recognize is that the underlying parameters are static. We plan to extend this modeling approach to capture time-varying parameters that are yet interpretable, and thus enable better forecasting.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.01.21260020 doi: medRxiv preprint

We list additional details required to reproduce the reported results. Table 4 below lists the parameter searchspace and hyperparameter ranges used in the experiments described in the main text. 

Since we wished to assess the robustness of optimal hyperparameters to choice of location, we performed experiments for several regions of India that were highly impacted by the pandemic: Mumbai, Pune, Delhi and Kerala. To facilitate comparisons, we used case count data from public sources [13] , across two different stages of the pandemic: our evaluations were carried out on case counts from July 1 to 28, 2020 and November 1 to 28, 2020. Fitting was carried out over durations prior to these dates. The evaluation metric used was the MAPE value on the entire evaluation period. Due to the prohibitively large search space, a complete grid search over the hyperparameter choices was not feasible. We thus ordered the experiments and performed them sequentially, optimising one or more parameters in each step and fixing them in subsequent experiments to optimise other parameters.

We first performed a grid search over combinations of and (Figure 9 ) to find the best choices. We found within the selected fitting period ⪆ 3000 trials were sufficient for convergence of sample mean parameters, and thus chose this value of . The application loss of interest is MAPE, however, in addition to this, we explored two other loss functions during fitting as well as estimation:root mean square error (RMSE) and root mean square logarithm error (RMSLE). The MAPE on the evaluation period with the three different fitting configurations was 5.50 (MAPE), 5.96 (RMSE) and 15.81 (RMSLE). Based on the results, the loss function selected was MAPE for both fitting and estimation.

The dynamical equations governing the transitions in SEIARD model ( Figure 2 ) are

where is the city population, , , and are the standard epidemiological parameters for a SEIR model, fatal is the transition probability to the mortality branch, and recov and fatal are timescales that govern transitions out of the A recov and A fatal compartments. Variables , , , recov , fatal , , denote the populations of the similarly named compartments.

Let [ ] = [ ℎ [ ]] ℎ ∈H be a multivariate time series with ℎ [ ] denoting the ℎ th compartment time-series, and H be the set of indices of components. Let the fitting period be given by [ , ] .

The key components of our MCMC-within-Gibbs sampling are: Likelihood function. We assume a likelihood function of the form

where N ( | , 2 ) denotes the Normal distribution pdf with mean and variance 2 following an appropriate conjugate prior. Further, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.01.21260020 doi: medRxiv preprint Proposal distribution. At iteration , we generate the samples from the proposal distribution for accept-reject step as

where −1 is the parameter vector chosen at − 1, and (·) is the pdf of a multivariate truncated Gaussian with the parameter range [ min , max ] and covariance matrix Σ prop as listed in Table 4 (g) (columns: Range and Prop. var).

We further assume that , the variance of the normal likelihood, has the conjugate prior ∼ InvGamma( , ). Thus, it is straightforward to show, by multiplying the Normal likelihood with the prior, that if the MCMC chain has sampled parameters , the sample ∼ ( | , ) is also drawn from an Inverse Gamma distribution with parameters

Parameter ranges for our implementation are in Table 4 . The MCMC implementation was adapted for = [ , , , ] with 5 chains of length 25 , a stride of 5 for sampling, a burn-in length of 50% of the chain, = 40, and = 2/700.

To validate the uncertainty estimates of Section 5.3, we compared the samples collected from both methods aggregated over 10 runs of each method ( Figure 10 ). Major distributional differences are found only in R 0 and active_ratio , but even for these parameters there is significant overlap between the distributions sampled by the two methods. Note that distributions for the ABMA approach are weighted by the importance sampling factor exp(− ℒ( )). Table 3 are marked.

To compare the efficacy of our uncertainty estimation, we compared the forecast distributions of the models submitted to ReichLab (Section 5.5) using the quantile loss described below:

where is the error between the forecasted quantile and the ground truth. In our case, the error is measured by the MAPE loss function. For every (model, quantile) tuple, we computed a median quantile loss value aggregated across regions. ABMA quantiles are generated by recomputing them daily. Figure 11 shows the variation of the median quantile loss across quantiles for all models. The loss curves of our model and the UMass-MechBayes model, the top performing model according to Table 3 are highlighted. While our model is one of the best around the 50 ℎ percentile, it exhibits higher relative loss at extreme quantiles. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted November 10, 2021. ; https://doi.org/10.1101/2021.11.01.21260020 doi: medRxiv preprint

Reichlab UMass Amherst. 2020. COVID-19 Forecast Hub

Age-stratified model of the COVID-19 epidemic to analyze the impact of relaxing lockdown measures: nowcasting and forecasting for Switzerland

Clustering with Bregman Divergences

Synthetic Data Generation for Improved COVID-19 Epidemic Forecasting. medRxiv

Information and Exponential Families in Statistical Theory

Algorithms for Hyper-Parameter Optimization

Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures

The challenges of modeling and forecasting the spread of COVID-19

Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases

Robust Lock-Down Optimization for COVID-19 Policy Guidance

Parameter estimation and uncertainty quantification for an epidemic model

Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecasts

Covid19India. 2020. Coronavirus in India: Latest Map and Case Count

Conjugate Priors for Exponential Families

Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases

Transmission dynamics of the COVID-19 outbreak and effectiveness of government interventions: A data-driven analysis

Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand

COVID-19 Data Repository

Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone

Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy

Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European Economic Area countries

The Mathematics of Infectious Diseases

A taxonomy of global optimization methods based on response surfaces

Kaggle. 2020. COVID19 Global Forecasting

A contribution to the mathematical theory of epidemics

Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach

Structural identifiability and observability of compartmental models of the COVID-19 pandemic

Projecting hospital utilization during the COVID-19 outbreaks in the United States

America's COVID warning system

Simulation of Covid-19 epidemic evolution: are compartmental models really predictive?

Chapter 33 Economic epidemiology and infectious diseases

Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models

A mathematical model for COVID-19 transmission dynamics with a case study of India

Mathematical modeling of COVID-19 spreading with asymptomatic infected and interacting peoples

A Framework for Evaluating Epidemic Forecasts

Social distancing and mobility reductions have reduced COVID-19 transmission in King County

The impact of COVID-19 and strategies for mitigation and suppression in low-and middle-income countries

Modeling between-population variation in COVID-19 dynamics in Hubei

Acknowledgements. We would like to thank Janak Shah and the WIAI Programs team for their support with data and deployment processes. This study is made possible by the generous support of the American People through the United States Agency for International Development (USAID). The work described in this article was implemented under the TRACETB Project, managed by WIAI under the terms of Cooperative Agreement Number 72038620CA00006. The contents of this manuscript are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. This work is co-funded by the Bill and Melinda Gates Foundation, Fondation Botnar and CSIR -Institute of Genomics and Integrative Biology. It was enabled by several partners through the informal COVID19 data science consortium, most notably Shreyas Shetty, Anupama Agarwal, and a group of volunteer data scientists under the umbrella group "Data Science India vs. COVID". We thank Amazon Web Services for Cloud credits and support, the Smart Cities Mission for providing early impetus to this work, and the Brihanmumbai Municipal Corporation and Integrated Disease Surveillance Program (Jharkhand) for support while facing unprecedented challenges.