key: cord-1014060-mdpyzr7c authors: Kirpich, A.; Koniukhovskii, V.; Shvartc, V.; Skums, P.; Weppelmann, T. A.; Imyanitov, E.; Semyonov, S.; Barsukov, K.; Gankin, Y. title: Development of an interactive, agent-based local stochastic model of COVID-19 transmission and evaluation of mitigation strategies illustrated for the state of Massachusetts, USA date: 2020-05-21 journal: nan DOI: 10.1101/2020.05.17.20104901 sha: b1929b4cbd1b7e4e98483a2535ccfae808ab41c2 doc_id: 1014060 cord_uid: mdpyzr7c Since its discovery in the Hubei province of China, the global spread of the novel coronavirus SARS-CoV-2 has resulted in millions of COVID-19 cases and hundreds of thousands of deaths. The spread throughout Asia, Europe, and the Americas has presented one of the greatest infectious disease threats in recent history and has tested the capacity of global health infrastructures. Since no effective vaccine is available, isolation techniques to prevent infection such as home quarantine and social distancing while in public have remained the cornerstone of public health interventions. While government and health officials were charged with implementing stay-at-home strategies, many of which had little guidance as to the consequences of how quickly to begin them. Moreover, as the local epidemic curves have been flattened, the same officials must wrestle with when to ease or cease such restrictions as to not impose economic turmoil. To evaluate the effects of quarantine strategies during the initial epidemic, an agent based modeling framework was created to take into account local spread based on geographic and population data with a corresponding interactive desktop and web-based application. Using the state of Massachusetts in the United States of America, we have illustrated the consequences of implementing quarantines at different time points after the initial seeding of the state with COVID-19 cases. Furthermore, we suggest that this application can be adapted to other states, small countries, or regions within a country to provide decision makers with critical information necessary to best protect human health. The epidemic of a novel coronavirus was first detected in the city of Wuhan in the 2 Chinese province Hubei on December of 2019 [1] [2] [3] [4] . Despite the unprecedented 3 efforts from Chinese authorities including the complete lockdown of the entire city of 4 Wuhan on January 22, 2020 the virus has rapidly spread to all continents except 5 Antarctica. The World Health Organization (WHO) officially declared the coronavirus a 6 global pandemic on March 11, 2020 [5] , only three months after the first case was 7 detected. The novel coronavirus is now officially named SARS-CoV-2 and the disease 8 caused by it has been called COVID-19 [6] to distinguish from SARS-CoV and the 9 corresponding severe acute respiratory syndrome (SARS) pandemic from 2003 [7] [8]. 10 Despite the much lower case-fatality rate, SARS-CoV-2 has caused morbidity and 11 mortality orders of magnitude higher than severe acute respiratory syndrome (SARS) 12 and Middle East respiratory syndrome (MERS) combined [9] . As of May 13, 2020, more 13 than 4.1 million infections have been reported worldwide, with more than 287, 000 14 deaths due to complications of COVID-19 [10] . As of May, 2020 there is neither an 15 effective virus-specific treatment, nor Food and Drug Administration (FDA) approved 16 vaccine available for SARS-CoV-2 [11] [12] [13] [14] . As such, social distancing and 17 quarantine are the only available measures to reduce the transmission and prevent 18 overwhelming the capacity of existing healthcare systems. Starting at the epicenter of 19 Hubei province [15] in January, 2020, governments around the world have implemented 20 society lockdown measures of varying degrees [16] [17] [18] [19] [20] . Since such measures 21 remain the only available tools to control the spread of the pandemic, it is critical to 22 understand the transmission dynamics of SARS-CoV-2 in the population. This would 23 allow for the prediction of COVID-19 cases and deaths over time under different 24 mitigation strategies, which could be implemented to reduce morbidity and mortality; 25 as well as the allocation of limited resources to medical providers. 26 To achieve this goal, multiple approaches can be implemented that are typically 27 driven by the quality and precision of the available data. The most commonly reported 28 data for epidemics are the incidence of new cases and deaths represented as a time 29 series over the fixed intervals (e.g. days or weeks) aggregated across multiple regions 30 and reporting sources [21] [22] [23] . This aggregated data can be used for incidence 31 curve reconstruction, modeling, and prediction when more detailed information about 32 each infected individual is not available [24] [25] [26] [27] [28] . Such models are called 33 May 17, 2020 2/14 compartmental models [29] where the study population is divided into groups while 34 individuals within each group are assumed to have the same characteristics of interest 35 (e.g. susceptible, infected, vaccinated, or immune). The compartmental models are 36 formulated via a defined system of differential equations that allows for both 37 deterministic and stochastic formulations to quantify the uncertainty of the model fit. 38 Those models provide insight into the underlying epidemic dynamics and allow for the can also be depersonalized in accordance to HIPAA regulations [35] to make the use of 70 the model versatile and not to violate the privacy of individuals. Q k = (x k , y k ), tinf k , det k , stg k , age k , rad k , p cont(k) , cont k , R 0(k) , sever k , dur k , st k (t) . • stg k = stg 1(k) , stg 2(k) , stg 3(k) -the vector of durations of three disease infection 86 stages measured in days that characterize the severity of the disease. It is 87 assumed that stg 1(k) + stg 2(k) + stg 3(k) = det k ; • age k -the age of the individual Q k at the time of the infection onset; • rad k -the distance (in meters) up to which the individual Q k is able to infect the 90 nearby individuals; • p cont(k) -the probability that during each day the individual Q k has any contacts 92 which lead to new infections; • cont k = µ cont(k) , σ 2 cont(k) -the individual-specific parameters that define the 94 distribution of the number of successful infection transmissions to other 95 individuals within a given day. This number is generated randomly for each day t, 96 provided that the individual has any transmissions on the given day (according to 97 the contact probability p cont(k) ). • sever k -the disease severity variable for the individual Q k that takes three values, 104 where 1 corresponds to lethal, 2 corresponds to severe, and 3 corresponds to mild; 105 the disease severity does not change for a given individual after it is determined 106 randomly from a trinomial distribution; • dur k -the disease duration from the infection onset to cure (or death) in days, 108 which is generated randomly based on the sever k parameter; • st k (t) -the status of the individual Q k at a given day t. The status of the 110 individuals within the simulation is expected to change over time and is expected 111 to take the following values: st k (t) = 0 -the individual is detected based on the external information i.e. 113 from the reported data that are used as the model input; st k (t) = 1 -the individual is infected but has not been identified as such yet; 115 st k (t) = 2 -the individual has been infected and detected as such, which has 116 also implied the individual's isolation (quarantine). st k (t) = 3 -the individual has recovered and is immune; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . coordinates for those places where the initial outbreaks were detected or to the centers 124 of the corresponding aggregated geographic units. The latter may be the case, if either 125 the exact infection acquisition locations are not known, or the privacy concerns prevent 126 the inclusion of such data into the model. In the latter case the centers of the 127 aggregated geographic units are taken as epicenters E i for each i = 1, 2, . . . , I. The local epicenters in the model are defined by a pair of geographic coordinates 129 (Lat, Long) and by an epicenter-specific region radius R i which is defined in meters. Therefore, for i = 1, 2, . . . , I the epicenter region is defined by a triplet: The epicenter regions are defined from the surveillance epidemiological data. As the 132 initial conditions in addition to the local epicenters the model incorporates the areas of 133 high density P = {P 1 , P 2 , . . . , P J } for j = 1, 2, . . . , J , where each P j represents a large 134 city or a densely populated area and which is also defined by a triplet: In the model the reporting times (days) for the initial index cases for each epicenter i where k = 1, 2, . . . , KS is the global index for initial cases across all timest 1 ,t 2 , . . . ,t S 145 and KS is the total number of the initial index cases that is simulated within the model 146 based on the input data. The tilde notation forQ k -s in D emphasizes the link to the 147 model input data. The time index that corresponds to individual day within the model is denoted as t 149 and is equal to 0 at the model baseline. The simulation baseline time t = 0 corresponds 150 to the latest reporting timetS of the earliest reported cases that are used for the model 151 input. The actual infection times for those index cases precede the selected baseline 152 simulation time t = 0 due to the infectivity periods generated for those index cases prior 153 to their reporting. The actual simulation starting time that accounts for the infectivity 154 periods is denoted as t = T min and is smaller than the baseline time t = 0. This May 17, 2020 5/14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. May 17, 2020 6/14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.17.20104901 doi: medRxiv preprint The first group of the reported indexes from (6) is used for the model calibration and optimization is performed by minimizing the sum of squared differences between the 218 model-produced outputs and the calibration data by using the Nelder-Mead numerical 219 minimization method [38] . The additional details about the model formulation, 220 parameterization, and calibration are summarized in the S1 Appendix. Tables 1 and 2 for the model-predicted cases and deaths, respectively. For example, the 250 summaries from Table 1 can be compared after one month of the baseline date i.e. on in 58% reduction in cumulative cases on April 26, 2020, in 63% reduction in cumulative 257 cases on May 26, 2020, and in 65% reduction in cumulative cases on June 26, 2020. 258 Compared to the quarantine start date in the third scenario, the first scenario results in 259 81% reduction in cumulative cases on April 26, 2020, in 87% reduction in cumulative 260 cases on May 26, 2020, and in 88% reduction in cumulative cases on June 26, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . The example of the summary graphs for the model-produced outputs for the second 296 scenario from Tables 1 and 2 are presented in Fig 3, which contains the four combined 297 graphs available in the "Statistics" tab in the top right corner of the tool. Those graphs 298 within the tool can be produced by setting the "Max Simulation Time" and "Forecast 299 Day" fields to July 15, 2020 and by running the model 500 times by using the "Daily 300 Forecast Evaluation" button. The 500 runs are necessary to produce the median 301 predictions and the corresponding 90% uncertainty prediction bands across those runs 302 by taking the 5-th and the 95-th percentiles across those situations for each time slot. Those graphs include the cumulative numbers of reported cases and deaths, together 304 with the currently hospitalized patients and unreported cases. The graphs also include 305 the reported data in blue. The calibration time period is highlighted in blue and is 306 bounded by vertical bars. May 17, 2020 9/14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . In this work the local agent-based modeling framework for respiratory diseases has been 309 presented. This framework incorporates the reported geographic incidence data that are 310 typically available from surveillance, which include individual's age, infection status, implemented at different times to compare different quarantine scenarios. As expected, 319 there was a decrease in the cumulative incidence and deaths inversely proportional to 320 the date quarantine was implemented; which resulted in approximately 50-80% 321 reduction in cases and deaths depending on the scenario. Compared to complex agent-based models, the compartmental models are based on 323 the assumptions of homogeneous mixing and can be parameterized by a relatively small 324 set of rates and initial conditions. The main challenges for the compartmental 325 models [26] [27] [43] are the determination of the compartment types that are used in 326 the model, the assignment of individuals between compartment i.e. the specifications of 327 the set of rules that assign each particular individual to each type of compartments, and 328 the determination of the parameters of interest which can either be postulated from 329 external sources or estimated from data. The agent based models, due to their inherited 330 complexity, incorporate separate individuals with multiple different characteristics and 331 parameters per individual. This adds another layer of parameterization flexibility, but 332 also introduces another layer of modeling challenges, since the number of individual's 333 characteristics within the model is determined by the modeler [30] [31] [32] . Ideally, the 334 May 17, 2020 10/14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . with the goal of predictions and intervention studies; 3) utilize the available surveillance 339 and public health data in the best possible way. The best possible way in this context 340 means, that all the information from the data that can be used to answer the questions 341 of interest are utilized, while the number of assumptions within the model beyond the 342 information available from the data is the smallest possible that is necessary to 343 implement the model. In the case of COVID-19, an epidemic which has quickly evolved into a pandemic, 345 the local epidemic developments in every region are expected to have different dynamics 346 influenced by multiple region-specific factors. Thus, an agent based model which utilizes 347 local settings is likely superior to a global agent-based model in this setting and can be 348 implemented with minimal inputs as long as local data are available. In this example, 349 we chose regional data for the state of Massachusetts, however we believe this 350 framework and interactive tool could be adopted and useful for small or middle size 351 countries or other administrative districts within a larger country, that have comparable 352 reporting and data quality across different administrative regions. May 17, 2020 11/14 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 21, 2020. In this paper, we have presented a novel, localized agent-based model that can be used 355 within minimal input data, which is publicly available and tailored to the population Clinical features of patients infected with 2019 novel coronavirus in Wuhan Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study A novel coronavirus emerging in China-key questions for impact assessment A novel coronavirus from patients with pneumonia in China The severe acute respiratory syndrome COVID-19 has killed more people than SARS and MERS combined, despite lower case fatality rate A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version) Measures for diagnosing and treating infections by a novel coronavirus responsible for a pneumonia outbreak originating in Wuhan, China. Microbes and infection The SARS-CoV-2 vaccine pipeline: an overview SARS-CoV-2 vaccines: status report The Guardian Google: Coronavirus (COVID-19) Johns Hopkins University: COVID-19 Case Tracker WHO: Daily Situation Reports Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The lancet infectious diseases Real-time forecasts of the COVID-19 epidemic in China from Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study. The Lancet Public Health Compartmental models in epidemiology Using data-driven agent-based models for forecasting emerging infectious diseases FluTE, a publicly available stochastic influenza epidemic simulation model Modelling disease outbreaks in realistic urban social networks The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Interventions to mitigate early spread of SARS-CoV-2 in Singapore: a modelling study. The Lancet Infectious Diseases Health Insurance Portability and Accountability Act (HIPAA). In: StatPearls [Internet Wiley Online Library; 2015. 37. www.mass.gov A simplex method for function minimization A novel sub-epidemic modeling framework for short-term forecasting epidemic waves