key: cord-0640406-bsjq0cio authors: Witt, Christian Schroeder de; Gram-Hansen, Bradley; Nardelli, Nantas; Gambardella, Andrew; Zinkov, Rob; Dokania, Puneet; Siddharth, N.; Espinosa-Gonzalez, Ana Belen; Darzi, Ara; Torr, Philip; Baydin, Atilim Gunecs title: Simulation-Based Inference for Global Health Decisions date: 2020-05-14 journal: nan DOI: nan sha: 53864e7a301814c971357a5726e55a7d96e14781 doc_id: 640406 cord_uid: bsjq0cio The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators. Machine learning has a growing role in increasing health service access and efficiency, particularly in resourceconstrained settings, making it a valuable tool for the global health community [39, 54] . Moreover, the COVID-19 pandemic [55] has underlined the importance of epidemiological modelling and computer simulation in informing the design and implementation of public health interventions at an unprecedented scale [18] . For many endemic diseases (e.g., malaria), in-silico optimisation of multi-modal intervention portfolios-from mass vaccination to bed nets-is well established [47] . Analogous modelling for COVID-19 interventions, including social distancing [20] , is mostly unexplored, yet subject to intense public interest [32] . The adoption of health informatics in worldwide health systems (e.g., OpenMRS [33] , mHealth [1]) enables access to abundant patient-level and aggregated health data [54] . This is fomenting the development of comprehensive modelling and simulation to support the design of health interventions and policies, and to guide decision-making in a variety of health system domains [22, 49] . For example, simulations have provided valuable insight to deal with public health problems such as tobacco consumption in New Zealand [50] , and diabetes and obesity in the US [58] . They have been used to explore policy options such as those in maternal and antenatal care in Uganda [44] , and applied to evaluate health reform scenarios such as predicting changes in access to primary care services in Portugal [21] . Their applicability in informing the design of cancer screening programmes has been also discussed [42, 23] . Recently, simulations have informed the response to the COVID-19 outbreak [19] . The process of informing health interventions and policies through simulations generally involves two steps: Model calibration The extent to which a simulator can reliably inform real-world prediction and planning is bounded by both model discrepancy [13] and how well the model has been calibrated to empirical data [3]. Optimising decision-making Identifying optimal multimodal intervention strategies and corresponding risks and uncertainties requires searching through potentially vast parameter spaces, which, due to the computational cost of running large simulators (e.g., in some epidemiological studies), usually cannot be exhaustively evaluated [46] . Despite their fundamental importance, model discrepancy and calibration of public-health simulators are frequently only informally addressed, or left undocumented [48, 40] . This may be partially explained by the fact that, while numerous methods for formal sensitivity and uncertainty analysis exist [28] , they in general do not scale to complex simulators with more than a few dozen parameters [38] . Similarly, evidence-based decision-making is usually optimised by comparing outcomes on a small number of hand-crafted scenarios and intervention strategies [46] . Among the simplest mathematical epidemiology models are deterministic compartmental models that partition individu- Figure 1 : Latent probabilistic structure uncovered using PyProb from the Imperial College CovidSim simulator run on Malta, demonstrating the first step in working with this simulator as a probabilistic program. Uniform distributions are omitted for simplicity. als in a population based on different stages of the disease 3 [29, 2, 11 ]. Advances in model construction, computing power, and novel insights into medical and socio-economic aspects have since stimulated the introduction of stochastic individual-based models 4 [41] to public health applications. These relatively complex and highly-parametrised models are implemented as simulator software that allow studying the global effects of self-organization and emergent properties arising from individual interactions at the local level. In general, inference in individual-based simulators is usually doubly intractable, as both simulator likelihood and evidence cannot be evaluated efficiently. Likelihood-free methods, including approximate Bayesian computation (ABC) [9] have been proposed [3, 16], but suffer from exponential scaling of inference with data dimension, requiring domain experts to define low-dimensional summary statistics, which ultimately determine quality of inference. Recent advances in machine learning have led to a new family of promising approaches to simulation-based inference [see 15, for an overview]. In particular, we argue that probabilistic programming [see 53, for an introduction] has a unique potential to standardise and automate model calibration and decision-making in individual-based simulators. Probabilistic programming allows one to express probability models using computer code and perform statistical inference over the inputs and latent variables of the program, conditioned on data observations (or constraints). This is achieved by using special-purpose probabilistic programming languages (PPLs) [24, 43, [26] . Within such a framework, one could, for instance, condition on desired health outcomes (e.g., ICU capacity not being exceeded in a pan-3 E.g., susceptible-infectious-recovered or SIR 4 Also referred to as agent-based model or multi-agent system. demic), and derive detailed posterior distributions over all interactions defined by the simulator [56] , providing insights on interventions effecting a desired outcome-with proper uncertainty quantification at all stages. To further enhance the applicability of simulation-based inference in this domain we highlight several opportunities for further method development. Automated amortisation by surrogate methods [25, 36, 34] , which aim to automatically identify and replace compute-intensive parts of a simulator through less expensive emulators, could be guided by the causal structure inherent to a simulator (Figure 1 ), such as many repeated, structurally identical stochastic time steps or multi-agent interactions that might be amenable to meanfield approximations [57] . In addition, pre-existing simulators could be turned into differentiable programs [6] through automated source-to-source transformations, thus allowing for the use of gradient-based optimisation and inference methods, including Hamiltonian Monte Carlo [37] . Last but not least, the unified interface specification afforded by probabilistic programming could allow simulators to also become amenable to other techniques from simulation-based inference and control, including dynamic programming and reinforcement learning [31, 52, 27] . To foster the development of a new standardised approach to model calibration and evidence-based decision-making in public health, we are working on instrumenting the existing CovidSim [19] and OpenMalaria [45] simulators with a probabilistic programming interface through the PyProb library. 5 We will publicly release our code to provide outof-the-box probabilistic programming inference over public health scenarios of interest in these two domains. We expect the mentioned techniques to play a role in dealing with communicable (infectious) diseases, which already entailed a significant burden for health systems in developing countries [35] before the worldwide impact of COVID-19. However, they can also be applicable to non-communicable diseases, such as diabetes and cancer, which are recognised major causes of morbidity and mortality worldwide [4, 30] . This will add to the already identified potential of machine learning in health policy [5] and improving health access, emphasising its value for global health in the efforts to achieve universal health coverage and sustainable development goals [39, 54] . [1] Sasan Adibi. Mobile Health: A Technology Road Map, volume 5. Springer, 2015. [2] Linda JS Allen. Some discrete-time SI, SIR, and SIS epidemic models. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand Strategies for mitigating an influenza pandemic Using discrete event simulation to compare the performance of family health unit and primary health care centre organizational models in Portugal Systematic review of the use and value of computer simulation modelling in population health and health care delivery The role of modelling in the policy decision making process for cancer screening: example of prostate specific antigen screening Church: a language for generative models Efficient Bayesian inference for nested simulators Hijacking malaria simulators with probabilistic programming Multitask soft option learning Bayesian calibration of computer models Containing papers of a mathematical and physical character High-quality health systems in the sustainable development goals era: time for a revolution Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review Covid-19: public health experts demand evidence behind UK's short self-isolation advice Cooking up an open source EMR for developing countries: OpenMRSa recipe for successful collaboration Deep Probabilistic Surro-Simulation-Based Inference for Global Health Decisions gate Networks for Universal Simulator Approximation Global, regional, and national incidence and mortality for HIV, tuberculosis, and malaria during 1990-2013: a systematic analysis for the global burden of disease study Amortized rejection sampling in universal probabilistic programming MCMC using Hamiltonian dynamics Probabilistic sensitivity analysis of complex models: a Bayesian approach Artificial intelligence, machine learning and health systems Mathematical models for the study of HIV spread and control amongst men who have sex with men Agent-Based and Individual-Based Modeling: A Practical Introduction Causal system modelling of cervical cancer screening. The Lancet Public Health Probabilistic programming in Python using PyMC3 Applying a system dynamics modelling approach to explore policy options for improving neonatal health in Uganda Mathematical modeling of the impact of malaria vaccines on the clinical epidemiology and natural history of plasmodium falciparum malaria: Overview Ensemble Modeling of the Likely Public Health Impact of a Pre-Erythrocytic Malaria Vaccine Malaria Modeling in the Era of Eradication Calibration methods used in cancer simulation models and suggested reporting guidelines Rethinking health systems strengthening: key systems thinking tools and strategies for transformational change Application of a system dynamics model to inform investment in smoking cessation services in New Zealand Simple, distributed, and accelerated probabilistic programming Black-Box Policy Search with Probabilistic Programs An introduction to probabilistic programming Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? Planning as Inference in Epidemiological Models Impact of different policies on unhealthy dietary behaviors in an urban adult population: an agent-based simulation model