key: cord-0634327-73kgcv1k authors: Ganesan, Sashikumaar; Subramani, Deepak title: Spatio-temporal predictive modeling framework for infectious disease spread date: 2020-06-27 journal: nan DOI: nan sha: 488c8c24c9b5a636b278f8f2b30b751c26ce0071 doc_id: 634327 cord_uid: 73kgcv1k A novel predictive modeling framework for the spread of infectious diseases using high dimensional partial differential equations is developed and implemented. A scalar function representing the infected population is defined on a high-dimensional space and its evolution over all directions is described by a population balance equation (PBE). New infections are introduced among the susceptible population from non-quarantined infected population based on their interaction, adherence to distancing norms, hygiene levels and any other societal interventions. Moreover, recovery, death, immunity and all aforementioned parameters are modeled on the high-dimensional space. To epitomize the capabilities and features of the above framework, prognostic estimates of Covid-19 spread using a six-dimensional (time, 2D space, infection severity, duration of infection, and population age) PBE is presented. Further, scenario analysis for different policy interventions and population behavior is presented, throwing more insights into the spatio-temporal spread of infections across disease age, intensity and age of population. These insights could be used for science-informed policy planning. Epidemic modeling and forecasting has gained renewed interest since late 2019 when the world was affected by the novel coronavirus pandemic (named . Several computational studies to predict the human-to-human spread of Covid-19 have been reported [1] [2] [3] [4] . Most of these efforts have been based on compartmental models and stochastic models (including agent-based models) 5 . In compartmental models (e.g., SIR, SEIR, SEIRS, DELPHI 5, 6 ), the population is divided into different compartments and the dynamics of the different compartments are modeled by a system of coupled ordinary differential equations (ODE). Here, the interaction among compartments is usually deterministic, whereas random processes are used to model the spread of infections in stochastic models. Agent-based models are stochastic models that undertake a bottom-up approach of modeling individual members of a population and the dynamics of their interaction in terms of probabilities of movement and contact. More than the total number of infections, it is essential to have more insightful predictions, e.g., infected population distribution across their age and level of infection severity for science-informed policy intervention and public health planning. The population distribution over the duration of infection is crucial for planing antiviral treatments, quarantine, ventilator support and contact tracing. This requirement necessitates a comprehensive and computationally efficient predictive modeling framework. Even though these features could be incorporated in ODE-based compartmental and stochastic agent-based models, it is very complex and computationally expensive. To overcome these challenges, we propose a novel partial differential equation-based spatio-temporal predictive modeling framework for forecasting the spread of infectious disease in heterogeneous populations in open geographies. The roots for our model lie in the population balance equations that are popular in chemical engineering and process studies 7 . In the proposed model, the infected population density is defined as a scalar field on a high-dimensional space. Specifically for predicting the spread of Covid-19, a six-dimensional model is presented. The first three dimensions are the space and time, and the other three are the infection severity, duration of the infection (i.e., time since infection), and age of the population. New infections, impact of quarantine, testing, contact tracing, immunity, intervention policy impact, health infrastructure, recovery, and death are all modeled on this six-dimensional space based on data-driven functions (where available), and/or simple algebraic and integral functions. Notably, our PDE-based model in the present paper is more compact and a versatile description of the spread of the disease compared to compartmental models, and computationally efficient compared to agent-based models. To showcase the capability of our distribution-based predictive modeling framework for infectious disease spread, we apply it to model and predict the spread of Covid-19 in India. Further, we present a scenario analysis, which could be used to draw insights for policy interventions. Let T ∞ be a given final time and Ω := Ω x ⊗ Ω be the computational domain of interest. Here, Ω x ⊂ R 2 is the spatial domain defining the geographical region of interest and Ω ⊂ R n , where n is the number of internal directions. Each of the n-internal directions represents the property of the population on which a distribution needs to be predicted. Suppose the properties of interest are the infection severity, duration of the infection and age of the population, then a model with three internal directions could be used as follows. Let denotes the duration of infection, d ∞ is the maximum duration of infection, L a = [0, a ∞ ] denotes the age interval and a ∞ is the maximum age of the population. The infection index v ∈ L v quantifies the severity of the infection among the infected population. Specifically, the population with infection index v = 0 is completely disease-free, with v = 1 has maximum severity, with v ≥ v sym shows symptoms and those with v < v sym are asymptomatic. The duration of infection index d ∈ L d quantifies the time since a population has been exposed to and contracted the disease. Specifically, the population that just contracted the disease has d = 0. Typically, a person is asymptomatic until they reach v = v sym , and the duration elapsed d is the incubation period in which the disease is sub-clinical and that population is actively spreading the disease. After recovery, a population doesn't necessarily go to v = 0, rather they reach v < v reco . Let ∈ Ω x and ∈ Ω , be the infected number density function of the population. To describe the evolution of the active infected population size distribution, we propose the population balance equation in the time interval with initial conditions Here, u denotes the advection vector that quantifies the multiscale spatial movement of the population in a differential neighbourhood of Ω x (e.g., migrant laborers, daily commute for work, logistics-related travel, periodic gathering for religious and social events), n is the outward unit normal vector to Ω x , ∂ Ω − := {x ∈ ∂ Ω x | u · n < 0 }, g n is the flux that quantifies the net addition of the infected population into Ω x from outside (the spatial movement of the population across the border of the domain ∂ Ω x ), and I 0 is the initial distribution of the infected population. Further, G = (G v , G d , G a ) T is the internal growth vector, where Here, β is the immunity of the infected population, γ is the pre-medical history of the infected population and α is the effective treatment index. Next, we define the rate term C = C R +C ID , where C R (t, x, ) is a recovery rate function that quantifies the rate of recovery of the population from the infection, and C ID (t, x, ) is the infectious death rate. We also define a source term F = C T (t, x, ) that quantifies the point-to-point movement of infected population (e.g., by air, train etc) within Ω x , which are not included in u and g n . Moreover, C T and u need to be defined in such a way that the net internal movement of infected population within Ω x is conserved. Moreover, B nuc is the nucleation function that quantifies the infection transmission from the infected to the susceptible population and it is a function of several parameters as follows Here, X ∈ Ω, σ , H and S D are the interactivity, hygiene and social distancing indices respectively. Finally, the total population N(t) at a given time t ∈ (0, T ∞ ] is defined by Here, N S , N B , N R , N I , N Q N ID and N D are the number of susceptible, newborn, recovered, infected (symptomatic/asymptomatic), quarantined, infectious death and natural death populations, respectively. The given initial and boundary conditions and the above defined parameters close the population balance system. The proposed population balance model (1) is comprehensive and built on the basis of several parameters as defined above. In this section, we describe the modeling of each parameter. The nucleation term B nuc quantifies the new infection number density that is added to the system at d = 0 for all t, x, v , and a . Depending on how the susceptible population interacts with the infected population, new infections are added to the system. We call this addition as nucleation (borrowing the terminology from process engineering), which is modelled as Here, γ Q ∈ [0, 1] in (5) determines the fraction of the infected population in quarantine and it can be modeled as in equation (7). Further, the factor γ Q is dependent on the level of screening including testing, strictness of enforcing isolation and compliance of susceptible general public. Suppose γ Q = 1, i.e., if the entire infected population is kept under strict isolation, newly infected population will be zero and eventually there will be no spread of disease. However, due to economic, social and democratic reasons, implementing such a strategy is nearly impossible and there is bound to be spread, i.e., γ Q < 1. Moreover, the integral on the right-hand side of equation (5) is the total non-quarantined number density of the infected population at (t, x), and R is the rate at which the non-quarantined population infects the susceptible population. The factor R is modelled as in equation (6), where R 0 is the basic reproduction rate, Here, the interactivity index σ ∈ [0, 1], hygiene index H ∈ [0, 1], and social distancing index S D ∈ [0, 1]. Suppose σ = 0 then everything is under perfect lockdown and R → 0. In case S D = 1, everyone is following perfect social distancing and R → 0. Moreover, the newly infected population has to be added at different age ( a ) and infection ( v ) levels for which the factors f 4 and f 5 are introduced. We propose to use logistic functions fitted to data from literature for f 1 , f 2 , f 3 ,; the normalized demography at (t, x) for f 4 (t, x, a ), and a Gaussian mixture with two components so that maxima is at v sym and tails are proportional to the interval length over [0, v sym ] and [v sym , 1] for f 5 ( v ). In addition, the constant in f 5 is chosen such that the integral of f 5 over its support is one. This condition is imposed to ensure that R 0 can be interpreted as the basic reproduction rate used in standard epidemiological models 8 H c , S D c can be estimated from experimental and clinical evidence. Furthermore, in light of new evidence, the functional forms of f 1 to f 5 can easily be modified. Finally, σ (t), S D (t) and H(t) change over time due to increased awareness, government measures and compliance by people. The growth factor G v quantifies how the infected number density is advected along the direction of l v , that is, how the infection becomes mild to severe/critical and vice-versa in the infected population. We can model it as a function of the medical history, immunity of the population, which in turn are functions of the age l a , treatment and socio-economic status. Nevertheless, as a simple first order model, we propose a nonlinear function of the age, where K g is a non-dimensionalization factor, p is a power of nonlinearity and a risk is the age offset. In general, the recovery rate C R and infectious death rate C ID depend on v , and in turn are functions of hospital facilities, age, and health state of the population. These rates can be modeled directly from clinical data for all ordinates v , d , a . For the exact functional forms refer to the Supplementary Information Appendix. The initial number density I 0 (x, ) can be estimated directly from available official data at the day of starting the simulation. However, the data is available only in-terms of total number of tested and confirmed cases at a x-location and the dependence on needs to be estimated via appropriate data-driven and analytical functions. As such, first we utilize data from a period of 14 days, along with the log-normal distribution of incubation period 9 to calculate the initial number density N D (x) at all the spatial points x, but integrated over the three internal ordinates ( v , d , a ), i.e., where N i (x) is the data of tested and positive. For the distribution along the internal ordinates, we propose to use the following initial infection number density distribution Here, the first term in the square brackets is the normalized demography function (same as f 4 ), second term is the log-normal incubation period function with fitted 9 a 2 = 0.42 and b 2 = 1.62, and f 5 is same as before. To exhibit the capabilities of the proposed model, the forecast of Covid-19 spread in India is presented here. The numerical scheme and the fitted model parameters are given in the Supplementary Information Appendix. The proposed model and numerical schemes are implemented in our in-house finite element package 10, 11 and have been verified in our earlier studies with applications to process engineering 12, 13 . With the spread of Covid-19 in India, the federal government imposed a nation-wide lockdown from March 25, 2020. To simulate the spread of infections starting from March 23, 2020, the initial distribution of infected population is estimated using the data of active cases from March 23 to April 5 according to equations 9 and 10. Then the infection spread forecast for one year is computed by solving the PBE system (equation 1). Further, data until June 21, 2020 is utilized to select the parameters (e.g., S D , C R , C ID , γ Q ) that best explains the actual data. Thereafter, the control parameter S D is varied to perform scenario analyses as presented next. Different future scenarios are predicted by varying S D (t) based on the anticipated individual behavior (social distancing, hygiene practice, compliance to government rules etc.) and government policies (quarantine rules, lockdown rules etc.). The first scenario, named Current Trend follows business as usual assuming further relaxation to lockdown rules. A second variant named Better Scenario assumes better compliance in the social distancing and other measures to control the spread of the Fig. 1 show the comparison with actual data and thereby validate the model. In addition, the time series plots for other scenarios including a worse-case scenario can be found at IISc-Model website 14 . In order to compare the performance of all states in India with the national trend, a uniform set of parameters is used for state-wide computations. In particular, the parameters are fitted by minimizing the error between the national data and the sum of the respective state predictions. Fig. 2 shows the actual data and the computed distribution using the above set of parameters for the states of Karnataka and Maharashtra. We can see that Karnataka has done better, whereas Maharashtra has done worse compared to the national trend. These insights can be used by the authorities to introduce state-wise lockdown policies and to plan infrastructure for quarantine, treatments etc. The performance of other states can be seen at IISc-Model website 14 . Our PBE model in fact predicts the distribution of the infected population over all the internal ordinates v , d , a . In the previous section, we have shown only the total number of infected, recovered and deceased populations. Now, to showcase one of the unique features of the model, we present and discuss the predicted population distribution for the Sunday lockdown scenario. Figure 3 shows the predicted distribution of active Covid-19 infected population over the ordinates v , d and a at different time instances (day 60, 120, 180, 240, 300 and 365) with their corresponding dates. Predicting the severity of the infected population is crucial to plan the hospital requirements including antiviral treatments, quarantine, hospitalisation, ventilator support, and oxygen support. In particular, the information of asymptomatic and symptomatic infected population helps the policymakers to plan quarantine rules. Moreover, the death rate is a function of the severity of infection and is crucial to predict the causalities arising from the infection spread. The duration of infection plays a key role in epidemic modeling. Classical models usually assume a constant duration. However, the recovery and the death of the patient depend on the immunity, age and health of the patient, medical treatments etc and thus the duration of infection need not be a constant. In the proposed model, the duration of infection is considered as an independent internal ordinate. The recovered population distribution over the duration of infection and other internal ordinates for days 60, 120, 180, 240, 300 and 365 is shown in Figure 4 along with their corresponding dates. Crucially, the predicted distribution with duration of infection, especially at initial stages, is key to plan for testing and to make effective decisions on quarantine, hospitalization and discharging from hospitals. Finally, the age of the population is pivotal in epidemics like Covid-19 since it affects children and aged population severely. Therefore, it is incorporated into the proposed model as another independent ordinate. In fact, the newly infected population is added from the susceptible population across the age distribution through the nucleation term. Moreover, the response to the antiviral treatment, death and recovery rates depend on age-specific health complications such as diabetics, cardiovascular disease, can also be incorporated in the PBE model with appropriate functions that depend on the age of the population. Our spatio-temporal modeling framework is the first comprehensive partial differential equation model for predicting infectious disease spread. Computationally, our model is efficient compared to agent-based stochastic models. Mathematically, our PDE system is more compact and comprehensive compared to ODE-based compartmental models. Specifically, the PDE is a continuum description of the infected population whereas the compartmental models are a discrete representation. Crucially, in contrast to the existing models, our model provides an insight into the distribution of infected population (presented in previous sections). This information is important to plan policy interventions, especially in Covid-19 like pandemics. Not only prognostic estimates, but also diagnostic estimates for more detailed analysis using distribution can be performed with the proposed framework. With more data and employing data-driven and machine learning approaches, we could further refine the parameters and functional forms of different model components to derive more insightful predictions. For example, to derive insights into the reopening of the workplace and educational institutions, the nucleation and advection vector could be modeled to account for interactions between different age groups and movement of people between homes and these places. The potential options for refining the model are virtually endless. In particular, there is no restriction on the choice of number of internal coordinates. For example, in addition to v , d , a , profession, mobility history, etc can also be added as internal coordinates. Even though we have emphasized Covid-19 pandemic in the present paper, the proposed model can readily be used for forecasting any other infectious disease spread. In future, a data assimilative framework for a real-time update of forecasts can also be implemented. The nucleation model B nuc defined in (5) , C R (t) = 0.01 + t * 0.00058 t < 65 0.0475 else. Remark: A factor N S (t)/N(t) needs to be introduced in B nuc when herd immunity develops. For India, the data to estimate N D (x) is downloaded from a publicly sourced database 17 . To distribute the number density among the internal ordinates, we employ distribution fits as given in equation (10) . Data set of age downloaded from Statista 18 is used to fit a 1 , b 1 , c 1 of equation (10) . Age-structured impact of social distancing on the covid-19 epidemic in india Covid-19 epidemic study ii: Phased emergence from the lockdown in mumbai Seir and regression model based covid-19 outbreak predictions in india Predictions for covid-19 outbreak in india using epidemiological models. medRxiv An introduction to infectious disease modelling Covid Analaytics Website Population balance modeling. promise for the future Mathematical models of infectious disease transmission The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application An object oriented parallel finite element scheme for computations of pdes: Design and implementation Parmoon -a modernized program package based on mapped finite elements Operator-splitting finite element algorithms for computations of high-dimensional parabolic problems An operator-splitting Galerkin/SUPG finite element method for population balance equations: Stability and convergence Finite Elements: Theory and Algorithms. Cambridge IISc Series An operator-splitting finite element method for the efficient parallel solution of multidimensional population balance systems Both authors contributed equally and reviewed the manuscript. As the focus of the present paper is to introduce the spatio-temporal predictive modeling framework for infectious disease spread, we use simple estimates and algebraic relations for certain parameters. Further, a few more assumptions are made on some parameters due to lack of actual data. Nevertheless, the features of the model and the insights into the prediction of the spread can be seen even with these assumptions. Let T ∞ = 365 days, d ∞ = 400 days, a ∞ = 45, 625 days and Ω x := ∪Ω k , k = 1, . . . , M, where M being the number of states and union territories in India. We assume that there will be no inter-or intra-state and international movements, that is, u = 0, no growth in level of infection, that is, G v = 0, the growth age is negligible and no source. Nevertheless, population distribution in all three internal ordinates v , d and a are described and tracked. Hence, the PBE in model (1) becomesOperator splitting finite element scheme A finite element scheme 15 based on operator splitting 12, 13, 16 is used to solve the high-dimensional PBE model (11) . Applying, operator splitting to (11), we getStep 1. (x-direction) For given I(t a , x, ) with I(t a = 0, x,Step 2.for all x ∈ Ω x v ∈ L v and a ∈ L a such thatIn the x-direction, the evaluation equation (12) has to be solved for every ∈ Ω by considering as a parameter. Similarly, the system (13) has to be solved in d -direction for every x ∈ Ω x v ∈ L v and a ∈ L a by considering these variables as parameters. The backward Euler and discontinues Galerkin with upwind methods are used for the temporal and the spatial discretisation, respectively. The implementation of the splitting algorithm in the finite element context has been presented in these papers 12, 13, 16 .