key: cord-0452164-fttwshmb authors: Dandekar, Raj; Barbastathis, George title: Neural Network aided quarantine control model estimation of global Covid-19 spread date: 2020-04-02 journal: nan DOI: nan sha: ebcab70ad3b2030adf82c372682b4b22a3ce678b doc_id: 452164 cord_uid: fttwshmb Since the first recording of what we now call Covid-19 infection in Wuhan, Hubei province, China on Dec 31, 2019, the disease has spread worldwide and met with a wide variety of social distancing and quarantine policies. The effectiveness of these responses is notoriously difficult to quantify as individuals travel, violate policies deliberately or inadvertently, and infect others without themselves being detected. In this paper, we attempt to interpret and extrapolate from publicly available data using a mixed first-principles epidemiological equations and data-driven neural network model. Leveraging our neural network augmented model, we focus our analysis on four locales: Wuhan, Italy, South Korea and the United States of America, and compare the role played by the quarantine and isolation measures in each of these countries in controlling the effective reproduction number $R_{t}$ of the virus. Our results unequivocally indicate that the countries in which rapid government interventions and strict public health measures for quarantine and isolation were implemented were successful in halting the spread of infection and prevent it from exploding exponentially. We test the predictive ability of our model by matching predictions in the duration 3 March - 1 April 2020 for Wuhan and in the duration 25 March - 1 April 2020 for Italy and South Korea. In the case of the US, our model captures well the current infected curve growth and predicts a halting of infection spread by 20 April 2020. We further demonstrate that relaxing or reversing quarantine measures right now will lead to an exponential explosion in the infected case count, thus nullifying the role played by all measures implemented in the US since mid March 2020. The Coronavirus respiratory disease 2019 originating from the virus "SARS-CoV-2" (Chan et al. 2020; CDC 2020) has led to a global pandemic, leading to 823, 626 confirmed global cases in more than 200 countries as of April 1, 2020 (WHO 2020). As the disease began to spread beyond its apparent origin in Wuhan, the responses of local and national governments varied considerably. The evolution of infections has been similarly diverse, in some cases appearing to be contained and in others reaching catastrophic proportions. In Hubei province itself, starting at the end of January, more than 10 million residents were quarantined by shutting down public transport systems, train and airport stations, and imposing police controls on pedestrian traffic. Subsequently, similar policies were applied nation-wide in China. By the end of March, the rate of infections was reportedly receding (Cyranoski 2020) . Taiwan, Hong Kong, and Singapore managed to maintain fairly low infection rates throughout, even though a second wave of infections is appearing in Singapore, perhaps due to incoming repatriates (Lin & Wang 2020) . South Korea, Iran, Italy, and Spain experienced acute initial increases, but then adopted drastic generalized quarantine. This did result in apparent recession of the spread in South Korea, whereas in the other three countries the effect of the policies is not yet clear. In the United States, where both the onset of widespread infections and government responses were comparatively delayed, infection growth currently appears to be explosive. As of April 2 2020, the United States has the highest number of infected cases (∼ 227k) globally. Given the available data by country and world-wide, there is an urgent need to use data driven approaches to quantitatively estimate and compare the role of the quarantine policy measures implemented in several countries in curtailing spread of the disease. Existing models analyzing the role of travel restrictions in the spread of Covid-19 either used parameters based on prior knowledge of SARS/MERS coronavirus epidemiology and not derived independently from the Covid-19 data (Chinazzi et al. 2020) , or were not implemented on a global scale (Kraemer et al. 2020) . In this paper, we propose augmenting a first principles-derived epidemiological model with a data-driven module, implemented as a neural network. We leverage this model to analyze and compare the role of quarantine control policies employed in Wuhan, Italy, South Korea and USA, in controlling the virus effective reproduction number R t Read et al. 2020; Tang et al. 2020; Li et al. 2020a; Wu & Leung 2020; Kucharski et al. 2020; Ferguson et al. 2020) . In the original model, known as SEIR (Fang et al. 2006; Saito et al. 2013; Smirnova et al. 2019) , the population is divided into the susceptible S, exposed E, infected I and recovered R groups, and their relative growths and competition are represented as a set of coupled ordinary differential equations. The simpler SIR model does not account for the exposed population E. These models cannot capture the largescale effects of more granular interactions, such as the population's response to social distancing and quarantine policies. This is where data come in: in our approach, a neural network added as a non linear function approximator (Rackauckas et al. 2020) informs the infected variable I in the SIR model. This neural network encodes information about the quarantine strength function in the locale where the model is implemented. The neural network is trained from publicly available infection and population data for Covid-19 for a specific region under study; details are in the Materials and Methods section. Thus, our proposed model is globally applicable and interpretable with parameters learnt from the current Covid-19 data, and does not rely upon data from previous epidemics like SARS/MERS. Since neural networks can be used to approximate nonlinear functions with a finite set of parameters, they serve as a powerful tool to approximate quarantine effects in combination with the analytical epidemiological models. The downside is that the internal workings of a neural network are difficult to interpret. The recently emerging field of Physics-Informed Neural Networks (Raissi et al. 2019 ) exploits conservation principles, SIR in our case, to mitigate overfitting and other related machine learning risks. All four regions that we applied our model to have developed infected and exposed populations that are sufficiently large to train our models. The first three are comparable in terms of population (11 million, 60 million and 52 million, respectively) and almost complete isolation from inbound travel while the USA has a much larger population (327 million), with increasing travel restrictions since mid March 2020. Leveraging the insights gained through reliable prediction and estimation in Wuhan, South Korea and Italy, we make forecasting predictions regarding the infection spread in the USA; thus making our model informative for quarantine and social distancing policy guidelines and regulations. Figure 1 shows results from the classical SEIR and SIR models applied to Wuhan data. Neither model can recover the stagnation seen in the actual infected number, about 30 days post the detection of the 500 th infected case in Wuhan, i.e. 24 th January, 2020. The neural network model trained to include quarantine, on the other hand, does predict this stagnation; see below. Figures 2-7 show results of our neural network models trained to include quarantine effects in the Wuhan, Italy, and South Korea regions, and respective predictions for the evolution of infections for approximately one month past the end of training. We use two parameters to quantify the results: the effective reproduction number R t and the We trained the models using data starting from the dates when the 500 th infection was recorded in each region: 24 th January, 27 th February, and 22 nd February for Wuhan, Italy and South Korea, respectively; and up to about a month thereafter. The respective models, superimposed over the actual recorded data, are in Figures 2a, 4a and 6a, generally showing good agreement. The respective forecasts are in Figures 3, 5 and 7. As can be seen, the model with quarantine control included is able to capture the plateau in the infected case count, as opposed to the standard SIR and SEIR models shown in Figure 1 April 9 1. In the case of Wuhan, because the onset was earliest, our forecast can be compared with recorded infection data in the period March 3 rd till March 24 th , also showing good agreement with the actual observations of infection stagnation leading to R t < 1 during that period (Cyranoski 2020) . In the case of the US, similarly to the other regions, i.e. starting when the 500 th infection was recorded on March 8 th , we trained the model till the latest available data, i.e till 1 April 2020. The infected case count estimated by our model shows a good match with the actual data ( Figure 8a ). Forecasting results for Q(t), R t for a period of 1 month following the current US policy are in Figure 9 . In Figure 10a we forecast the number of infections the US would experience starting from 1 st April if the US were to follow its current quarantine policy as opposed to gradually adjust to adopting the respective quarantine models learnt from the more reliable Wuhan, Italy and South Korea data. We arbitrarily set the adjustment period to 17 days; i.e till 17 April 2020. Figure 10b shows the effective quarantine rates Q(t) obtained from (4.17) that would apply to the US during the respective adjustment period. Details are in the Materials & Methods section. The results show a generally strong correlation between strengthening of the quarantine controls, i.e. increasing Q(t) as learnt by the neural network model; actions taken by the regions' respective governments; and decrease of the effective reproduction number R t . For example, in Italy, government restrictions reportedly (Ghiglione et al. 2020) tightened during the week preceding mid March, which is also when our model shows a sharp increase in Q(t) and corresponding decrease in R t (figures 4b, c). For Wuhan and South Korea, similar cusps in government interventions took place earlier, in the weeks leading to and after the end of January (Cyranoski 2020) and February, respectively (Normille 2020) . These cusps were also captured well by our model (Figures 2b, c and 6b , c, respectively). Even for the USA, Q(t) shows a stagnation till 20 March 2020, after which it shows a sharp increase accompanied with a decrease in R t (Figures 8b, c) , which is in alignment with the ramping up of government policies and quarantine interventions post mid March in the worst affected states like New York, New Jersey, California, and Michigan. A comparative analysis of the quarantine strength function Q(t) learned by the neural network for different countries reveals that Wuhan had the highest magnitude and South Korea had the highest growth rate of Q(t). This can be attributed to the stringent government interventions and strict public health measures including immediate isolation and quarantine impositions in Wuhan and South Korea. This eventually resulted in the halting of infection spread and a corresponding R t < 1 within a month for Wuhan ( Figure 2c ) and within 20 days for South Korea (figure 6c) after the first signs of a pandemic were recognized. It is reported that the infected case count stagnated nation-wide in China by the beginning of March (Cyranoski 2020 ) and in South Korea by the end of March (Fisher & Sang-Hun 2020) ; which eventually led to a stagnation in the quarantine interventions employed in these countries. This is in general qualitative agreement with our forecasting results which show a plateau in Q(t) and R t at R t < 1: Figures 3a,b and 7a ,b. In Italy, as of March 20 th , I(t) is appearing to be linear (Figure 4a) , which is consistent with lower rates of infections being actually reported (Horowitz & Kirkpatrick 2020 ) and can be taken as a precursor to stagnation. It is also consistent with adoption of strict movement restrictions by the government shortly before the March 20 th date. We forecast that, for Italy, R t will drop below 1 and Q(t), R t both will stagnate between mid to end of April 2020 (Figure 5a , b) indicating halting of the spread of infection. Owing to the relaxed quarantine and isolation policies in the US in its initial stages post the infection spread, our model converges to Q(t) ≈ 0.4 − 0.6 ( Figure 8b ) which is the smallest compared to other regions. Even though the effective R t is still greater than 1 as of April 1 2020 (Figure 8c) , its growth has started to show a decreasing trend and we expect the infection to start showing stagnation with R t < 1 by 20 April 2020 if the current US policies continue without change (Figures 9b, 10a ). This will be accompanied by a continuous ramp up of quarantine policies (Figure 9a ). At its peak, we forecast the infected count to reach approximately 600,000 before stagnation, again assuming no change in US policies. Our mixed model analysis for USA, employing Q(t) learnt from the models of Wuhan, Italy and South Korea in the USA model starting from 1 April 2020, reveals that stronger quarantine policies (Figure 10b ) might lead to an accelerated plateauing in the infected case count, as shown in Figure 10a , and subsequently smaller infected case count. On the other hand, in agreement with National Institute of Allergy and Infectious Diseases estimates (Miller 2020) , we forecast that relaxing or abandoning the quarantine policies gradually over the period of the next 17 days may well lead to ∼ 1 million infections without any stagnation in the infected case count (Figure 10a ) by mid April 2020. The classic SEIR epidemiological model has been employed in a number of prior studies, such as the SARS outbreak Fang et al. (2006) ; Saito et al. (2013) ; Smirnova et al. (2019) as well as the Covid outbreak Read et al. (2020) ; Tang et al. (2020) ; Wu & Leung (2020) . The entire population is divided into four sub-populations: susceptible S; exposed E; infected I; and recovered R. The sub-populations' evolution is governed by the following system of four coupled nonlinear ordinary differential equations (Smirnova et al. 2019; Wang et al. 2020 ) Here, β, σ and γ are the exposure, infection and recovery rates, respectively, and are assumed to be constant in time. The total population N = S(t) + E(t) + I(t) + R(t) is seen to remain constant as well; that is, births and deaths are neglected. The recovered population is to be interpreted as those who can no longer infect others; so it also includes individuals deceased due to the infection. The possibility of recovered individuals to become reinfected is accounted for by SEIS models (Mukhopadhyay & Bhattacharyya 2008 ), but we do not use this model here, as the reinfection rate for Covid-19 survivors is considered to be negligible as of now. The simpler SIR model neglects exposure, assuming instead direct transition from susceptible to infected; it is described by three coupled nonlinear ordinary differential equations as (4.7) Here, β is the infection rate. The reproduction number R t in the SEIR and SIR models is defined as (4.8) An important assumption of the SEIR and SIR models is homogeneous mixing among the subpopulations. Therefore, they cannot account for social distancing or mass quarantine effects. Additional assumptions are uniform susceptibility and disease progress for every individual; and that no spreading occurs through animals or other non-human means. Alternatively, the models may be interpreted as quantifying the statistical expectations on the respective mean populations, while deviations from the model's assumptions contribute to statistical fluctuations around the means. We applied both SEIR and SIR models to the case of Wuhan only. The results are shown in Figure 1 and verify that these models fail to predict the early arrest of infectious spread due to quarantine policies. Initial conditions The starting point t = 0 was the day at which 500 infected cases were detected, i.e., I(t = 0) = 500. The initial number of susceptible individuals was S(t = 0) = 11 million, Wuhan's population. The initial exposed population was assumed to be E(t = 0) = 20 × I(t = 0) in accordance with (Read et al. 2020; Wang et al. 2020) and the number of recovered individuals was set to a very small value R(t = 0) ≈ 10. Parameter estimation The time resolved data for the infected, I data and recovered, R data case count for Covid-19 was obtained from Chinese National Health Commission. The optimal values of the parameters β, γ in the SIR model and β, σ, γ in the SEIR model were obtained by performing a local adjoint sensitivity analysis (Cao et al. 2003; Rackauckas et al. 2019) of the ODE problems in these models, by minimizing the mean square error loss function L(β, σ, γ) defined as L(β, σ, γ) = log(I(t)) − log(I data (t)) 2 + log(R(t)) − log(R data (t)) 2 (4.9) The minimization procedure was performed using the ADAM optimizer (Kingma & Ba 2014) for 500 ∼ 1000 iterations. The optimum values were used to produce figure 1. To study the effect of quarantine control globally, we start with the SIR epidemiological model. This choice is to minimize the number of free parameters in the model and avoid overfitting. To include quarantine control in the modelling, we augment the SIR model by introducing a time varying quarantine strength term Q(t) and a quarantined population Thus, the new effective reproduction rate becomes . (4.10) Since Q(t) does not follow from first principles and is highly dependent on local quarantine policies, we devised a neural network-based approach to approximate it. Recently, it has been shown that neural networks can be used as function approximators to recover unknown constitutive relationships in a system of coupled ordinary differential equations (Rackauckas et al. 2020 (Rackauckas et al. , 2019 . Following this principle, we represent Q(t) as a n layer-deep neural network with weights W 1 , W 2 . . . W n , activation function r and the input vector U = (S(t), I(t), R(t), T (t)) as Q(t) = r (W n r (W n−1 . . . r (W 1 U ))) ≡ NN(W, U ) (4.11) For the implementation, we choose a n = 2-layer densely connected neural network with 10 units in the hidden layer and the ReLU activation function. This choice was because we found sigmoidal activation functions to stagnate. The final model was described by 63 tunable parameters. The neural network architecture schematic is shown in the attached Supplementary Information. The governing coupled ordinary differential equations for the augmented SIR model are dR(t) dt = γI(t) (4.14) dT (t) dt = Q(t) I(t) = NN(W, U ) I(t). (4.15) The quarantined population is initialized to a very small value T (t = 0) ≈ 10 and, thereafter, the neural network learns how to approximate it based on the local data from each region under study. Initial conditions The starting point t = 0 for each simulation was the day at which 500 infected cases were detected, i.e. I 0 = 500. The number of susceptible individuals was assumed to be equal to the respective regional populations, i.e. S(t = 0) = 11 million, 60 million, 52 million and 327 million for Wuhan, Italy, South Korea and USA respectively. Also, in all simulations, the number of recovered individuals was initialized to a small number R(t = 0) ≈ 10. Parameter estimation The time resolved data for the infected, I data and recovered, R data for each locale considered is obtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The neural network-augmented SIR ODE system was trained by minimizing the mean square error loss function L NN (W, β, γ) = log(I(t)) − log(I data (t)) 2 + log(R(t)) − log(R data (t)) 2 (4.16) that includes the neural network's weights W . Minimization was carried out through local adjoint sensitivity analysis (Cao et al. 2003; Rackauckas et al. 2019 ) following a similar procedure outlined in Rackauckas et al. (2020) and implemented using the ADAM optimizer (Kingma & Ba 2014) for 300 ∼ 500 iterations. To avoid over-fitting, the training is stopped when the loss function, L is seen to stagnate and the first derivative I ′ (t), R ′ (t) is seen to match that of the data. (Lyons 2020) , leads to smaller recovery rate and hence a smaller fraction of the population being transferred from the infected compartment to the recovered compartment in the model described in (4.12) -(4.15). Thus, simultaneous training of W, β, γ as described above leads to our model over-estimating the infected case count. As a result, for the USA, we first find the optimal γ by minimizing (4.16), and then use this value of γ for minimizing the loss function, L(W, β) = log(I(t)) − log(I data (t)) 2 . Such an independent optimization procedure for estimating γ may lead to small errors in the estimation of R(t). Such errors are seen to be negligible in this case (figure 8a), thus validating this approach for the USA. The estimates of β, γ, Q(t) obtained using this procedure are shown in table 1. Table also shows the intervention efficiency defined as the number of days elapsed between detection of 500 th case and the first time when the effective reproduction number reached R t < 1 in the chosen locale. Forecasts in figures 3, 5, 7 were obtained by using the sub-population data on the final days of their respective training periods to initialize the trained neural network models for Wuhan, Italy and Korea. For figure 10 , the forecasts were obtained by similarly initializing the model but subsequently in the post April 1 st period adjusting the quarantine model gradually over 17 days till 10 April according to where j = Wuhan, Italy, Korea for the respective assumed quarantine policy adoptions. The authors declare no conflicts of interest. Data collection Data for the infected and recovered case count in Wuhan is obtained Table shows infection and recovery rates β and γ, respectively, and the range of quarantine strength function, Q(t) obtained from minimising (4.16); along with the intervention efficiency defined as the number of days elapsed between detection of 500 th case and the first time when the effective reproduction number reached Rt < 1. [H] from the data released by the Chinese National Health Commission. Infected and recovered count data for Italy, South Korea and USA is obtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. Adjoint sensitivity analysis for differentialalgebraic equations: The adjoint dae system and its numerical solution CDC 2020 Coronavirus Disease 2019 (COVID-19) Situation Summary A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak. Science . CHP 2020 Centre for Health Protection of the Hong Kong Special Administrative Region Government. CHP closely monitors cluster of pneumonia cases on mainland What china's coronavirus response can teach the rest of the world Modelling the sars epidemic by a lattice-based monte-carlo simulation Impact of non-pharmaceutical interventions (npis) to reduce covid-19 mortality and healthcare demand How South Korea Flattened the Curve Italian lockdown puts 16m people in quarantine Dip in Italys Cases Does Not Come Fast Enough for Swamped Hospitals Adam: A method for stochastic optimization The effect of human mobility and control measures on the covid-19 epidemic in china Early dynamics of transmission and control of covid-19: a mathematical modelling study. The Lancet Infectious Diseases Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov2) Singapore, Taiwan and Hong Kong face second wave of Coronavirus cases Those recovered unknown, but coronavirus deaths rise in ny Anthony Fauci predicts 'millions' of U.S. coronavirus cases, more than 100,000 deaths Analysis of a spatially extended nonlinear seis epidemic model with distinct incidence for exposed and infectives Coronavirus cases have dropped sharply in south korea. whats the secret to its success jl -A julia library for neural differential equations Universal differential equations for scientific machine learning Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations Novel coronavirus 2019-ncov: early estimation of epidemiological parameters and epidemic predictions Extension and verification of the seir model on the 2009 influenza a (h1n1) pandemic in japan Forecasting epidemics through nonparametric estimation of time-dependent transmission rates using the seir model Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions Phase-adjusted estimation of the number of coronavirus disease 2019 cases in wuhan, china Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study