key: cord-0943996-9kh7tex4 authors: Bannur, N.; Maheshwari, H.; Jain, S.; Shetty, S.; Merugu, S.; Raval, A. title: Adaptive COVID-19 Forecasting via Bayesian Optimization date: 2020-10-27 journal: nan DOI: 10.1101/2020.10.19.20215293 sha: a0470557bf1f0780969b70a5927b78623a9140f4 doc_id: 943996 cord_uid: 9kh7tex4 Accurate forecasts of infections for localized regions are valuable for policy making and medical capacity planning. Existing compartmental and agent-based models for epidemiological forecasting employ static parameter choices and cannot be readily contextualized, while adaptive solutions focus primarily on the reproduction number. In the current work, we propose a novel model-agnostic Bayesian optimization approach for learning model parameters from observed data that generalizes to multiple application-specific fidelity criteria. Empirical results demonstrate the efficacy of the proposed approach with SEIR-like compartmental models on COVID-19 case forecasting tasks. A city-level forecasting system based on this approach is being used for COVID-19 response in a few highly impacted Indian cities. The ongoing COVID-19 pandemic and the consequent devastating increase in morbidity and mortality [5] have accentuated the need for robust epidemiological forecasting models. Deployment of such models as part of the public health response requires support for (a) fine-grained contextualization to account for spatio-temporal variations in contact behaviour, lockdown, testing, hospitalization, and reporting policies, (b) multiple models depending on the case count availability (e.g.,age-stratified or testing-based extensions), (c) addressing varying data reliability due to reporting delays, and (d) multiple application use cases with different fidelity requirements (e.g., medical preparedness is tied to accurate 2 − 4 week forecasts while long-term policy making might focus on peak estimation). Most existing models [1, [7] [8] [9] [10] [11] that use static parameters from domain knowledge and even adaptive likelihood maximization-based methods [4, 13] do not adequately address these requirements. Problem Statement: For ∈ [0, ] and region , given the case count time series x( , ) and region metadata w( ) (e.g. population), forecast x ( , ) for ∈ [ , + ] s.t. an application specific loss (x (:, ), x(:, )) on the forecast period is minimized. BayesOpt-based Blackbox Learning: For any parametric forecasting model of the form (x( ), ) = x ( + 1 : + ) we optimize * = argmin ( (x( ′ ), ), x( ′ + 1 : ′ + )) using observations from the period [ ′ , ′ + ] for an appropriate loss function (·). Optimizers such as the hyperopt library [3] can be used. Uncertainty Estimation: Since certain applications require confidence intervals, the parameter sets (or trials) explored during Bayesian optimization are used to construct a posterior distribution ( ( )| ) on the parameter space given data via a mapping from the observed loss values (·) and the generative distribution. For instance, in case of exponential families [2] , the posterior probability ( ( )| ) ∝ exp(− ( ( ), ) and is estimated via validation on a holdout period. Model Class & Initial Conditions: For practical deployment, we chose SEIR extensions due to their parsimonious nature, flexibility to incorporate testing effects and stratification, and high interpretability. While observed compartments can be readily initialized, the initial values of unobserved compartments (e.g., exposed) are viewed as latent variables and estimated similar to other model parameters, thus also partially accounting for imported cases. We evaluated the efficacy and flexibility of our approach applied to SEIR models on COVID-19 case data [6, 12] of multiple Indian districts for different periods and synthetic data relative to other baselines with relevant choices of loss functions and varying data reliability. Extensive experimentation was performed to identify the parameter search space and the optimal settings for the Bayesian optimization (e.g., the training period) as well as estimate the accuracy for different lead times. For the sake of brevity, we present results with the extended SEIR (Figure 2 ) model for four regions with train and test periods chosen from July in Figure 1 . The forecast variants shown correspond to best fit, average of 10 best trials and an appropriately weighted ensemble of all the trials. The loss function is the average MAPE on all the key case counts. The ensemble-mean provides a stable forecast (test MAPE < 10% ). Ongoing explorations include alternative methods of estimation of parameters and uncertainty under varying testing and mobility levels, as well as theoretical analysis of control mechanisms for SEIR family models. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10.19.20215293 doi: medRxiv preprint 6 APPENDIX the transition times of various compartments ( , , , ), and the probabilities of parallel pathways ( ). The observed compartments are initialized from case counts( + :active, :recovered, :deceased) while the unobserved compartments ( :infectious, :exposed) are handled as latent variables. The equations governing the dynamics are given below. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.19.20215293 doi: medRxiv preprint Stochastic Epidemic Models with Inference Information and Exponential Families: In Statistical Theory Algorithms for Hyper-Parameter Optimization Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases Worldometer Coronavirus Cases. 2020. Coronavirus Cases Covid19India. 2020. Coronavirus in India: Latest Map and Case Count Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates The Mathematics of Infectious Diseases A Spatiotemporal Epidemic Model to Quantify the Effects of Contact Tracing, Testing, and Containment Modeling COVID-19 on a network: super-spreaders, testing and containment. medRxiv COVID-19 stats in India. 2020. COVID-19 REST API for India Rt COVID-19 This study is made possible by the generous support of the American People through the United States Agency for International Development (USAID). The work described in this article was implemented under the TRACETB Project, managed by WIAI under the terms of Cooperative Agreement Number 72038620CA00006. The contents of this manuscript are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. We thank Anupama Agarwal, Disha Makhija, Mohit Kumar, Sumod Mohan and the COVID modeling team at Wadhwani AI for their contributions to the broader COVID-19 forecasting effort.