key: cord-0989838-8g64u3ux authors: Yeo, Yao Yu; Yeo, Yao-Rui; Yeo, Wan-Jin title: A Computational Model for Estimating the Progression of COVID-19 Cases in the US West and East Coasts date: 2020-03-27 journal: nan DOI: 10.1101/2020.03.24.20043026 sha: cf779757265b3301e6e0672c5212b3dd55fe8a12 doc_id: 989838 cord_uid: 8g64u3ux The ongoing coronavirus disease 2019 (COVID-19) pandemic is of global concern and has recently emerged in the US. In this paper, we construct a stochastic variant of the SEIR model to make a quasi-worst-case scenario prediction of the COVID-19 outbreak in the US West and East Coasts. The model is then fitted to current data and implemented using Runge-Kutta methods. Our computation results predict that the number of new cases would peak around mid-April and begin to abate by July, and that the number of cases of COVID-19 might be significantly mitigated by having greater numbers of functional testing kits available for screening. The model also showed how small changes in variables can make large differences in outcomes and highlights the importance of healthcare preparedness during pandemics. Coronaviruses (CoVs) comprise a family of enveloped positive-stranded RNA viruses that are known to infect a broad range of animals ranging across mammals and birds. CoVs have been prevalent worldwide for several decades [8, 26] and cause diseases generally associated with the respiratory, gastrointestinal, hepatic, and nervous systems. To date, there are seven CoVs that infect humans and are associated with respiratory symptoms. Four CoVs (HCoV-229E, HCoV-NL63, HCoV-OC43, and HKU1) present relatively mild respiratory tract infections [26] , whereas the other three CoVs, Severe Acute Respiratory Syndrome-related Coronavirus (SARS-CoV), Middle East Respiratory Syndrome-related Coronavirus (MERS-CoV), and the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) are considerably more virulent and have caused outbreaks with pandemic potential. They are responsible respectively for the 2002-2003 SARS outbreak [6, 15] , the 2012 MERS outbreak [36] , and the ongoing COVID-19 pandemic [37] . As CoVs have a global distribution, exhibit considerable genetic diversity and genomic recombination, and have zoonotic potential while sympatry between humans and wildlife increase [5, 26] , it is very likely that there will be new CoV outbreaks in the future. The ongoing COVID-19 pandemic (previously 2019-nCoV) originated from Wuhan city in Hubei province of China [37] . While the first case was reported in December 2019, it has since been thought to emerge as early as November 2019 [25] . Since then, COVID-19 continues to spread around the world, and at this date, over 150 countries have been affected [3] . Although some countries have managed to contain COVID-19 efficiently, others previously thought to have been well-prepared for outbreaks due to higher living standards and healthcare quality have witnessed an unexpected number of cases [24] . As a result, the scale of COVID-19 within each nation has become relatively uncertain and has only heightened social and economic unrest. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint COVID-19 was first reported in the US on January 20, 2020 [11] . While little initial action was taken during the initial days, an exponential increase in the number of cases [24] spurred immediate actions in an attempt to contain COVID-19. For instance, social distancing is being enforced via the closure of educational institutions, restrictions on travel, and suspension of events, and research funding for SARS-CoV-2 has increased. Unfortunately, containment of COVID-19 has been hindered by various events, such as the initial production of defective test kits [27] , a current limited availability of test kits [28, 29] , and lack of medical supplies [30, 31] . It is currently unclear how well the US healthcare system will cope with the COVID-19 pandemic, especially in view of the lack of adequate projections of the scale of COVID-19 infections and mortality across the country. In this paper, we attempt to construct a mathematical model that simulates the scale of COVID-19 outbreak in the US. Despite the uncertainty of the pathogenicity and molecular virology of SARS-CoV-2 [23] , a few key features pertaining to its transmissibility have been studied. The basic reproduction number 0 , which represents the average number of new infections an infectious person can cause in a naïve population, generally range from 2.3 to 2.9 [22, 33, 38] . The incubation period (the time between initial exposure and development of symptoms) is approximately 5.2 days on average [16, 19] , and the infectious period (the time from onset of symptoms to isolation) is approximately 3 days on average [20, 21] . These transmission features are useful parameters to consider for modeling, which we will describe in detail next. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint To model the COVID-19 outbreak, we use a variant of the SEIR model (see [7, 18] for some examples). We mainly focus on the East and West Coast of the United States. We assume that there is no travel between the different population zones, and that the natural birth and death rates are equal. In Figure 1 , S represents the susceptible population, and E represents the exposed population (i.e. individuals who have been infected but are not yet themselves infectious). The infected population I divided into two groups, and , wherein the subscripts H and C stand for "hospital" and "community" respectively. represents those that are infected and isolated (such as those that have been pre-tested and found to carry the virus), and for those that are infected but not isolated (such as those with unreported cases or present mild symptoms that are overlooked). Thus, people in is unable to spread the virus whereas those in can spread the virus. As testing kits in the US are currently in low supply, the current model projects significantly higher levels of than at any given point in time. Once infected, there are two possibilities: either recovery or death. These outcomes are represented by the populations R and D above, with . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint subscripts as designated above. We also assume the recovered population will have acquired immunity to the virus and are no longer susceptible. In the mathematical model underlying Figure 1 , we treat the various symbols above as functions over time, and the precise variables for ordinary differential equations are as follows. Definition The reciprocal of the incubation period The reciprocal of the infectious period 0 The basic reproduction rate, defined to be the value / The contact rate per unit time The proportion of exposed that are pre-tested The proportion of infected that dies The expression = + + + + + represents the effective total population in the US, i.e. the population that matters for the purposes of virus transmission. Then, for each coast in the US, the discussion above can be mathematically modeled as follows. Notice that 0 is not present in the model above. However, 0 is important as the contact rate is hard to directly estimate, and we will use 0 and to estimate using the relation = 0 . The basic reproduction rate is also not static over time, and we will use different values over the following three phases: • Phase 1: There is no action done on the epidemic • Phase 2: Some action is done to slow down the epidemic, and the coast is preparing for the next phase. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint • Phase 3: Coast shutdown; schools moved online, events canceled, and most public areas closed. To prepare for a quasi-worst-case-scenario, we assume 0 ≥ 1. This is because 0 = 1 represents a neutral reproduction rate, whereas 0 < 1 and 0 > 1 represents less and more spread, respectively. The next section describes in detail how we simulate our model. We simulated our model in MATLAB to prepare for a quasi-worst-case-scenario. We employed a fourth-order Runge-Kutta method [12] to numerically solve the above specified ordinary differential equations, with the range ∈ [0, 250] and stepsize ℎ = 1 3 (both measured in days). As for the initial conditions for each coast, we assume that there will not be any pre-existing immune responses that may help defend against the virus, due to SARS-CoV-2 being sufficiently divergent from other CoVs [23] . Hence, the entire population is initially susceptible due to the virus. For example, if a single infected person is introduced at 0 = 0 to a population of 53 million people (e.g. West Coast US, including Seattle), we will then have ( 0 ) = 53 * 10 6 , ( 0 ) = 1, and We estimate the proportion of pre-tested exposed individuals ρ is set to be 0.1, and the mortality rate c is set to be 0.05 [2, 35] . The basic reproduction rate 0 is specified based on the government response to the outbreak (the three phases); for instance, if a partial shutdown is implemented at time 1 , and then a full shutdown at 2 , we will have 1 ≤ 0 ( 2 ≤ ≤ 250) ≤ 0 ( 1 ≤ ≤ 2 ) ≤ 0 (0 ≤ ≤ 1 ) . In our case, using the estimates obtained in [21] and depending on the coast being studied, we set the first phase 0 (0 ≤ ≤ 1 ) to be either 2.7 or 2.9, the second phase 0 ( 1 ≤ ≤ 2 ) to be either 2.3 or 2.5, and the third phase 0 ( 2 ≤ ≤ 250) = . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . The West Coast will take the lower of the two values to account for the lower average population density. To account for the variations of incubation period −1 and infectious period −1 within the population, we sampled 100 values of each of them from an Erlang distribution of shape 2 [21] at each timestep . Let us denote these 100 values as −1 ( ) and −1 ( ), where = 1, … , 100. The Erlang distribution for which we sampled −1 from has a mean set to be 5.2 days, and for −1 , 3 days, as suggested by [16, 19, 20] . The minimum of the distribution range is also restricted to be 1, i.e. minimum possible incubation period is 1 day. Correspondingly, we will have 100 values of the contact rate for at timestep, ( ) = 0 ( ) ( ). We then performed 100 iterations of the fourth-order Runge-Kutta for each of these , , and to obtain 100 values for the next timestep, +1 . Finally, the mean of these 100 values at +1 was taken. It is also important to note that at every iteration, the effective total population should be updated. In summary, our chosen values are organized in the following is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . Using elements outlined in the previous sections to model the situation for the West Coast, East Coast, as well as the entire US, we ran 50 simulations using the model described in the previous section. We then compared our simulation results to current data between January 20 and March 19 [2, 13] , assuming a delay in reporting time of about 8 to 10 days and using population estimates derived from [4] . The first case of COVID-19 in the West Coast was reported on January 20, 2020 in Seattle, WA. We used a population estimate of 53 million. If January 20 is designated Day 0, we assume Phase 2 started at around Day 45, and Phase 3 started at around Day 60. The simulation results suggest that, under quasi-worst-case-scenario, the data for the number of people infected by COVID-19 is far more than reported: about 80% of (those infected but not isolated) are not accounted for. This is probable as many symptoms of COVID-19 may be passed off as just cases of mild flu. In the quasi-worst-case-scenario, we predict the number of reported infections at its peak to be about 25,000 (Fig. 2 ) and the actual number of infections at about 90,000 ( Fig. 3) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint The first case of COVID-19 in the East Coast was reported on February 1, 2020 in Boston, MA. However, since no other new cases were reported for about two weeks, we assume a starting time of one month later than for the West Coast. In the graphs below, we will still plot them starting from the first reported case in the US (i.e. January 20). For the East Coast, we used a population estimate of 152 million. Based on government actions, we also assume that the starting dates for Phases 2 and 3 are about a week behind that of the West Coast. The simulation results are similar to that of the West Coast; for a quasi-worst-case scenario, the actual number of infections is predicted to be about 400,000 for reported (Fig. 4) , and about 1,450,000 for the actual number of infections (Fig. 5 ). Each figure comprises 50 different simulations that are represented by the "fuzz" lines. On the x-axis, the starting date at the origin is January 12, 2020, and 90 represents 90 steps of 8 hours. The blue circles represent the actual data of reported cases, adjusted for delay. The left figure estimates the total number of reported infections over time using the formula = + 0.2 , and the right figure is a magnification that includes available data to date. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . As COVID-19 has escalated into a pandemic, we also considered that a greater number of people would fall ill due to the lack of prior immunity. Considering the previous pandemic, the 2009 Influenza A (H1N1) pandemic where around 24% of people were infected [14] , as well as the nonstatic nature of our variables, we ran another simulation on the entire US population with approximately 22% to 26% of the population infected (based on randomness) to predict such a scenario. For this simulation experiment, we used values for and starting days of the phases that was a rough weighted estimate of the two coasts, but used the same values for the other variables as described in the previous section. We also assumed five infected individuals were introduced into the population. As Fig. 6 illustrates, in this quasi-worst-case-scenario, about 3.8 million of the US population would die from COVID-19 (assuming a death rate of 5%), and the remaining 76.2 million would recover from the virus. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . This paper presents a model for simulating the scale of COVID-19 pandemic in the United States. The simulations focused on the West and East Coasts (Figs. 2-5) , and also modeled a hypothetical worst-case scenario for the entire US (Fig. 6) . The model implemented factors specific to the US, which include the three phases of government response, travel restrictions that restrict crossboundary transmissions, and limits to testing kits and healthcare accessibility resulting in semistrict isolation. The model fits the reported data by assuming that about 80% of the non-isolated cases are not reported (Figs. 2, 4) ; however, the actual number of cases may be much higher (Figs. 3, 5) . That assumption agrees with a prior study [17] and provides another evidence that COVID-19 is not easy to contain. The model also predicts that the peak of the outbreak would occur by early April, and the outbreak would wind down by the start of July. However, the model does not account for delays in reporting; as such, one should expect the peak reported number of infections to occur . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . https://doi.org/10.1101/2020.03. 24.20043026 doi: medRxiv preprint around mid-April. In addition, the peak would occur sooner on the West Coast, which was affected first. However, all peaks should occur around mid-April unless there are major changes to the reproduction rates, for which there is no current evidence. It should be noted that the trends in the model's estimates share similarities with the epidemic curves during the SARS and MERS outbreaks [32, 34] . One way to mitigate the spread of COVID-19 is to successfully isolate more infected people, i.e. to have a higher relative proportion of over . That outcome should be achieved when defective testing kits are replaced and more testing kits are allocated, so that more exposed people can be tested and isolated accordingly. For instance, if 50% more people had been tested ( = 0.15) since the start of the outbreak, the model would predict 40% of the projected severity However, the sudden onset of COVID-19 and the overall unpreparedness of the United States make these data inaccessible. It should also be noted that while the global mortality during COVID-19 is currently 3.8%, it is noticeably different across nations [24] . As a result, we chose a 5% mortality rate ( = 0.05) in our quasi-worst-case scenario prediction of COVID-19 in the US. Other factors at a molecular level can further modulate our variables. For instance, the actual 0 in the US might be different as the transmissibility and infectivity of CoVs have been shown to be modulated by abiotic factors such as temperature and humidity [1] , and the robustness of host immunity further affects the etiology, pathogenesis, and overall virulence of CoVs [9] . As . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted March 27, 2020. . a result, our paper used numbers from prior studies that are based on other countries hit relatively hard by COVID-19 such as China, South Korea, and Italy, and enables us to successfully construct a quasi-worst-case model of COVID-19 in the US. We are mindful that computational simulations are, by their very nature, approximations. There are currently no predictive models that satisfactorily produce a picture of the spread or clinical impact of the disease as too many variables can affect the spread of a disease, especially for super-spreaders like COVID-19. Moreover, simulations can vary noticeably with small variations in assumptions and parameters. To cite one example, minor changes in the distribution of the infected population between and can dramatically affect the model's predictions. Also, at the societal level, the length of time that the cohort of patients remain hospitalized is often unknown in the early stages of a pandemic, and can greatly influence the use and deployment of medical supplies and personnel, thus altering the course of the infection and recovery rates. Nevertheless, the few scenarios presented and discussed in this paper strongly suggest that even with current containment and mitigation efforts, the COVID-19 outbreak will significantly impact the health of the US population. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 27, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 27, 2020. . respectively. Note that more reported cases are initially observed than originally projected because testing many more people shifts potentially unreported to the cohort. Two peaks are observed for the West Coast: the original peak is projected have already passed, and implementing 3X more tests will result in another peak of reported infections. Effects of Air Temperature and Relative Humidity on Coronavirus Survival on Surfaces COVID-19) Situation Summary International Locations with Confirmed COVID-19 Cases The United States Census Bureau. 2020, City and Town Population Totals Origin and evolution of pathogenic coronaviruses Identification of a Novel Coronavirus in Patients with Severe Acute Respiratory Syndrome Capturing the time-varying drivers of an epidemic using stochastic dynamical systems An Overview of Their Replication and Pathogenesis SARS coronavirus and innate immunity Clinical Characteristics of Coronavirus Disease 2019 in China First Case of 2019 Novel Coronavirus in the United States Differential Equations: A Dynamical Systems Approach Novel Corona Virus Estimating age-specific cumulative incidence for the 2009 influenza pandemic: a meta-analysis of A(H1N1)pdm09 serological studies from 19 countries A Novel Coronavirus Associated with Severe Acute Respiratory Syndrome The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) A conceptual model for the coronavirus disease 2019(COVID-19) outbreak in Wuhan, China with individual reaction and governmental action Incubation Period and Other Epidemiological Characteristics of 2019 Novel Coronavirus Infections with Right Truncation: A Statistical Analysis of Publicly Available Case Data Time-varying transmission dynamics of Novel Coronavirus Pneumonia in China Transmission dynamics of 2019 novel coronavirus The reproductive number of COVID-19 is higher compared to SARS coronavirus Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Coronavirus Disease (COVID-19) -Statistics and Research Wuhan seafood market may not be source of novel virus spreading globally Genetic Recombination, and Pathogenesis of Coronaviruses The New York Times. Estimates Fall Short of F.D.A.'s Pledge for 1 Million Coronavirus Tests The New York Times. US Virus Testing Faces New Headwind: Lab Supply Shortages. 2020 Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures A deterministic epidemic model for the emergence of COVID-19 in China World Health Organization. MERS-CoV maps and epicurves Novel Coronavirus (2019-nCoV) situation reports Isolation of a Novel Coronavirus from a Man with Pneumonia in Saudi Arabia A Novel Coronavirus from Patients with Pneumonia in China Preliminary estimating the reproduction number of the coronavirus disease (COVID-19) outbreak in Republic of Korea and Italy by 5 We thank Professor Bruce Ganem (Department of Chemistry and Chemical Biology) andProfessor John Muckstadt (School of Operations Research and Industrial Engineering) of Cornell University for their encouragement, invaluable advice and dedicated guidance throughout the study. YYY conceived the study and obtained the data. YRY and WJY designed and implemented the model and interpreted the results while YYY provided biological context. YYY, YRY and WJY wrote the manuscript. All authors contributed equally to the project. The authors declare that they have no competing interests.