key: cord-0921833-4poqm263 authors: Paul, Swarna Kamal; Jana, Saikat; Bhaumik, Parama title: Explaining Causal Influence of External Factors on Incidence Rate of Covid-19 date: 2021-09-18 journal: SN Comput Sci DOI: 10.1007/s42979-021-00864-6 sha: 69d74985a08219fc550233828b03e87766db9955 doc_id: 921833 cord_uid: 4poqm263 Classical susceptible–infected–removed model with constant transmission rate and removal rate may not capture real world dynamics of epidemic due to complex influence of multiple external factors on the spread and spatio-temporal variation of transmission rate. Also, explainability of a model is of prime necessity to understand the influence of multiple factors on transmission rate. Thus, we modified discrete global susceptible–infected–removed model with time-varying transmission rate, recovery rate and multiple spatially local models. We have derived the criteria for disease-free equilibrium within a specific time period. A convolutional LSTM model is created and trained to map multiple spatiotemporal features to transmission rate. The model achieved 8.39% mean absolute percent error in terms of cumulative infection cases in each locality in a region in USA for a 10-day prediction period. Comparison with current state of the art methods reveals performance superiority of the proposed method. A perturbation-based spatio-temporal model interpretation method is proposed which generates spatio-temporal local interpretations. Global interpretations are generated by statistically accumulating the local interpretations. Global interpretations of transmission rate for a region in USA shows consistent positive influence of population density, whereas, temperature and humidity have very minor influence. An experiment with what-if scenario reveals locality specific quick identification of positive cases, rapid isolation and improving healthcare facilities are keys to rapid convergence to disease-free equilibrium. A long-term forecasting of 160 days predicts new infection cases in a region in USA with a median error of 455 cases. Dynamical systems equations based on compartmental modeling of epidemiology have been widely used to predict the spread of an epidemic. Susceptible-infected-removed or SIR model is one such simplified set of differential equations to model the spread. However, accurately determining parameter values like the transmission rate for a specific disease is a challenge. The dynamics of a disease may vary across space and time. Many external factors may influence the transmission rate. Considering a constant transmission rate for a disease, grossly oversimplifies the model, thus compromising accuracy. Secondly, knowing the factors influencing the transmission rate and the dynamics of the influence can provide a vivid understanding of the disease progression. Incidence rate of an epidemic is defined as frequency of new observed infection cases, whereas, in vanilla SIR model the transmission rate (also known as contact rate) can be derived by dividing incidence rate with product of previous susceptibles and infected in a population. There are several different types of nonlinear incidence rate suggested in the literature [1] [2] [3] [4] [5] [6] [7] . However, most of them adopt some type of simple predefined function with few parameters to model the incidence rate. Simple functions have low representational capability. Thus, they may not capture the detail dynamical variations of the incidence rate caused by multiple factors. We propose a Convolutional LSTM-based spatiotemporal model to map the transmission rate of Covid-19 with respect to multiple input features and thereby map the derived incidence rate from transmission rate. The model can forecast incidence rate with high spatiotemporal resolution, provided availability of clean historical data in that resolution. Exploratory analysis reveals probable influence of external features on transmission rate and eventually helped in feature selection. A spatiotemporal local interpretation method of a black box model is proposed which in turn is used to explain the trained model. The explanations reveal local influence of different external features on the transmission rate. A generalized global explanation is also generated to find common influence of factors across multiple locations and over a period of time. We experimented with available data of Covid-19 across multiple regions of USA and the model achieved 7.95% and 0.19% mean absolute percent error in terms of forecasting transmission rate in each locality and cumulative total infection cases across the country in a 10-day prediction period, respectively. A quantitative comparison between the proposed method and other prior art models reveals exceptional performance in forecasting of the epidemic spread. The generated explanations reveals high influence of population density, somewhat medium influence of gender ratio and median population age on the transmission rate, globally. There are minor influences of temperature and temperature deviation but barely any observable influence of humidity. However, local influences of features vary widely across multiple small regions. A criterion for disease-free equilibrium within a specific time period has also been derived for discrete SIR model with variable transmission and recovery rate. A longterm forecast using the trained model and modified recovery rate to satisfy disease-free equilibrium criteria reveals rapid dampening of active infection cases to reach the baseline. However frequent spikes due to resurgence are seen in this scenario. A comparative study is made with forecasted dynamics using current normal recovery rate to reveal necessary actions for rapid containment of the disease. The curve for long-term forecasting (in a specific region of United States) with current normal recovery rate faithfully follows the actual data with a median error of 455 cases per day in a 160 days prediction period. The paper is organized as follows. We conduct a brief literature survey in the next section. The subsequent section briefly explains the discrete SIR model with variable transmission rate followed by which, spatiotemporal modeling of transmission rate is discussed. Then influence of external features on transmission rate are explained. After this, long-term forecasting of disease progression with a current normal scenario and a "what-if" scenario is explained. Before the last section, the key inferences from the study are discussed. The final section concludes the paper. Kermack and Mckendrick [8] modeled communicable diseases using differential equations. Hethcote introduced the SIR model [9] where population is compartmentalized into susceptible, infected and removed groups. A set of differential equations modeled the dynamics of population in different compartments. In traditional SIR model incidence rate or the number of new infections per unit time varies bilinearly with the number of infections and number of susceptible in a population considering the transmission rate as constant. However, assumptions like homogenous mixing, non-dependence on external factors, no psychological effects on population etc. may not be realistic in many cases. Thus, several authors [1] [2] [3] [4] [5] [6] [7] introduced different types of non-linear incidence rates mostly addressing the saturation and psychological effect. Saturation effect states that the incidence rate might slow down and saturate as number of infected individuals increases due to low availability of susceptible individuals. Psychological effect on the population results in increased cautiousness among susceptible individuals as the epidemic spreads thus, slowing down the transmission rate. Most of the incidence rates stated above satisfy weakly non-linear property and are too simple to capture any arbitrary effects of the environment. SIR model with time varying transmission and recovery rate have been studied in [10] and threshold theorems are derived. Liu et al. [11] introduced a time varying switched transmission rate to model nonlinear incidence. Hu et al. developed a modified stacked autoencoder model of the epidemic spread in China and they claimed to achieve high level of forecasting accuracy [12] . On observing a universality in the epidemic spread in each country, Fanelli and Piazza [13] applied mean-field kinetics of susceptible-infected-recovered/dead epidemic model to forecast the spread and provided an estimation of peak infections in Italy. Zhan et al. [14] integrated the intercity migration data in China with susceptible-exposed-infected-removed model to forecast an estimation of epidemic spread in China. Hong et al. [15] considered variable transmission rate of Covid19 and came up with variable R-naught factor of Covid-19. Xi et al. [16] used deep residual networks to model spatiotemporal characteristics of the spread of influenza and experimented with real dataset of Shenzhen city in China. Paul In SIR model the total population in a region is compartmentalized into three classes, namely susceptible (S), infected (I) and removed (R). Initially the whole population is in susceptible class. An individual can move from susceptible to infected class on contracting the disease. An infected individual can move to removed class by either getting recovered and immune to the disease or deceased. The dynamics of the disease spread can be modeled by the following set of differential equations where (t) is disease transmission rate or contact rate and (t) is removal rate which is sum of recovery rate and mortality rate. It is assumed the population size ( N ) remains constant during the course of epidemic. S(t) , I(t) and R(t) are scaled as fraction of total population. Thus, the following equation holds true From [10] we get the following ∀t > t 0 , where r = R t 0 , R(t) ≥ r∀t > t 0 and I(t) ≥ 0 We consider discrete time steps in our modeling and measurements are taken on daily basis. Thus, replacing differential with difference equation: Solving for I(t): (5) Expanding log as Taylor series and taking only the first term, Considering a constant average difference between transmission rate and removal rate = − within the period t 0 = 0 and t = T such that 0 ≤ ≤ 1: Considering NI(T) < 1 as disease-free equilibrium state, the upper bound of can be derived as following such that the epidemic reaches baseline in time T: Maintaining > 0 asymptotically converges the total infection count to 0 at exponential rate thus makes the disease-free equilibrium stable. Assuming a constant mortality rate, from Eq. (10) it can be deduced that increasing the recovery rate will directly reduce the time span of the disease outbreak. However, there is a hard limit for the removal rate, (t) ≤ 1 . But (t) can be greater than 1, specially during initial outbreak when total infection count is low. In such situation dampening the spread of infection will not be possible only with treatment facilities. Immediate restriction of mobility in area of outbreak and rapid isolation of infected individuals can reduce the transmission rate. Once it comes down below 1, enhanced treatment facilities can increase the recovery rate, thus reducing the span of the disease outbreak. The transmission rate can vary spatially as well as temporally based on multiple variables. Geographical location, weather conditions [18] , human mobility [14] , population statistics might be some of the impacting factors changing the dynamics of the spread. An exploratory analysis reveals probable dependency of multiple spatial and temporal features on the transmission rate. Spatially co-located regions might have similar dynamics of the spread with high autocorrelation of transmission rate in a localized region. However distant regions may have dissimilar transmission dynamics with low correlation. Thus, a large geographic area has been divided into small regions called as grids. Each grid has been divided even further into smaller regions called pixels. A population within a pixel is assumed to be constant and transmission dynamics is modeled by separate SIR models for each pixel. Each grid consists of co-located regions which might be impacting each other's transmission rate. Feature is constructed for each grid as multichannel temporal sequence of matrices which in turn is used for training a ConvLSTM [19] network to model the transmission rate. Data has been obtained for a region in United States from multiple sources [20] [21] [22] [23] [24] [25] [26] . The time span of the data is from 2020-03-21 to 2020-05-11. Covid19 daily data for USA at county level are filtered by a spatial region of USA as shown in Fig. 1 . The region is geospatially divided into M × N grids of equal sizes bounded by calculated latitudes and longitudes. Figure 2a illustrates a grid bounded by latitudes and longitudes. The dotted line square is called as frame. The overlapping areas in all 4 directions in a frame allows flow of spatial influence from neighboring grids. A frame is in turn divided into L × L pixel. Each pixel represents a bounded area in geospatial region. Each pixel contains a value mapped to certain feature in the bounded geospatial region. Frame matrices are constructed for each feature and concatenated through a third axis called channels. For example, transmission rate and population density are two features and they represent two separate L × L matrices in a frame concatenated across a third axis. Some features like transmission rate, active infection fraction, weather etc. are distributed spatio temporally. Whereas, other features like population density, female fraction, median age are assumed time invariant and have no temporal component. Thus, they are only distributed spatially and copied along temporal axis. Population density has been log transformed to reduce skewness and normalized. Other features are only normalized in 0-1 scale. Daily transmission rate and removal rates at pixel level have been calculated as following, where i ∈ {1 … n} denotes each pixel, ΔI + i (t) and ΔR i (t) are fraction of new cases in infected class and new individuals in removed class respectively at time t in pixel i . N i represents population in pixel i and other notations hold usual meaning as stated in "Discrete sir model with variable transmission rate": Each training sample of a frame is represented by a tensor of dimension T × L × L × C, where T is the total time span and C is number of channels or features. As shown in Fig. 2b each training sample is generated by sliding a time window of size W +1 by 1, leaving behind a test case sample of time window size W ′ in the most recent period. Number of training samples for a frame can be calculated as Thus, total number of training samples S train for all frames can be calculated as The forecasting problem is framed as supervised learning problem. Given a sequence of observed multichannel frames of spatial data as matrices X 1 , X 2 … X t the objective of the model is to predict the next single channel frameY t+1 . The training samples are divided into input sequences of length W and output frames. The model forecasts the transmission rate in each pixel in a frame for each timestep. Thus, the output frame consists of only one channel. The input training dataset ( X train ) can be represented as a tensor of size S train × W × L × L × C and the output dataset ( Y train ) as S train × W × L × L × 1 . For training, the input sequences are selected from all frames having non-zero total infection count. Figure 2b illustrates the sequence of a frame. The frames t-7 to t-3 represents an input training sequence ( X train ) of lengthW . The output frame ( Y train ) for this training sample is t-2. Other training samples are generated by sliding the window W + 1 backwards in time by 1. The most recent frames t-0 and t-1 represents the test output frames ( Y test ) and immediate sequence of frames t-6 to t-2 is the test input sample ( X test ). The test set X test is represented by a tensor of size Table 1 presents a sample data for pixel 101 in grid 387 which represents the counties New York and Bronx. The data shows values of different calculated fields based on new infection cases, new deaths, new recovery cases reported per day and the total population in the pixel. The data is shown for first 6 days only, starting from 2020-03-21. The values ΔI + i , ΔR i , S i and I i are multiplied with total population N i to illustrate the actual counts. The population N i is calculated as 60% of the actual total population in the counties as it is assumed, based on reported R-naught factor of Covid-19, herd immunity [27] will be achieved once approximately 60% population gets infected. The distribution of calculated i is highly skewed with some outliers. Thus, value of ′ i is calculated by dividing i with its 95th percentile value and hard limiting the maximum value to 1. The variable ′ i is finally used to construct the frames. rh i tabulates the relative humidity in the region on daily basis. ND i represents the normalized population density which varies spatially but temporally it is fixed. Figure 3 illustrates the image of individual channels of the frames constructed from the data of grid 387 and its neighboring grids. Each frame is divided into 16 × 16 pixels including an overlapping margin of 4 pixel on all sides. Values of each pixel is represented in greyscale shade where a darker shade denotes higher value. The first three images (starting from top) depict the frames for transmission rate ( ′ i ) in day 0, 2 and 5. Similarly, the next three images represent relative humidity. The last 3 represents population density, female fraction and median age and they do not vary temporally. The pixel marked in red circle is the pixel 101. These frames are combined to create a temporal sequence of multichannel images and fed into the model as training data. The primary purpose of the exploratory analysis is to understand the distribution of transmission rate and identify probable influence of different features on the transmission rate. Eight external features are analyzed against transmission rate to find probable influence. Among eight features, four are spatial features having no temporal component, namely population density, housing density, female fraction and median age. The reason for choosing these external factors is twofold. First, other data like human mobility is not publicly available at county level resolution. Second, it can be presumed that these factors might affect the transmission rate as dynamics of the virus might vary based on weather conditions, age and gender. Also, in general it is presumed that region with higher population density generally have higher transmission rate. Figure 4 illustrates scatter charts between average transmission rate and four spatial Table 1 Sample data for grid 387 and pixel 101 Day features for multiple pixels. The color gradient represents log transformed cumulative number of infection cases in each pixel. Only those pixels are filtered which experienced at least 30 days of running infection cases and having at least 10 cumulative infection cases at the beginning of the observation period. Figure 4a , b displays scatter charts and regression lines of average transmission rate with respect to population density and housing density in each pixel, respectively. The two external features are log transformed and scaled to get upper bounded by 1. Log transformation reduces skewness and influence of outliers in the data. As observed in the charts the transmission rate is positively correlated with both the features which is quite intuitive. Places with high population density is expected to experience rapid spread of the disease. Locations with high population density also experienced highest number of cumulative cases. Figure 4c , 4d displays scatter charts and regression lines of transmission rate with respect to female fraction and median age of the population respectively. In Fig. 4c , 16 pixels have been filtered out having female fraction less than 0.45 to remove the skewness in the data. There is a slight positive correlation between female fraction and transmission rate. However, this might not invoke a suggestive idea about the dependency of this external feature on transmission rate as majority of the pixels resides in the range of 0.50-0.53 female fraction with barely any trend in that range. Also, there is an indirect correlation as in general pixels with high female fraction have high population density. Median age has negative correlation with transmission rate. There is an indirect correlation in this case also, as in general pixels with high median age have low population density. Another intuitive assumption can be, population with high median age are less mobile thus restricting the spread of the disease. Apart from four spatial features four other external spatiotemporal features are analyzed to observe any influence on transmission rate. Figure 5 illustrates time lagged cross correlation between transmission rate and other spatio-temporal features at pixel level. The external features are time lagged from 0 to 15 time steps and cross correlated with transmission rate for each lag. In the plot, pixels are arranged in increasing order of total infection cases. Figure 5a , b shows the plot of cross correlation of transmission rate with respect to average daily temperature and 3-day running window temperature standard deviation, respectively. Average temperature is slightly positively correlated in time lag range of 0-5. of transmission rate with respect to average daily relative humidity and daily removal rate, respectively. There is an overall positive correlation with respect of relative humidity specially in pixels with highest infection cases. Removal rate is mostly negatively correlated with transmission rate except in few pixels having high infection cases. Correlation might not represent causality. Thus, we performed granger causality test [28] of transmission rate with respect to different features. Granger causality is a statistical hypothesis test for finding if one time series can help improving the forecasting accuracy of another time series. It might not measure true causality rather it measures predictive causality. Chi-squared test is chosen as the hypothesis testing method and minimum pvalues for each pixel are calculated. Augmented Dickey-Fuller test [29] is performed to test stationarity of all timeseries. Table 2 displays the result of granger causality and Dickey-Fuller tests. The column '% of p value < 0.05' represents percentage of pixels for which the granger causality test gave p value less than 0.05 for each feature. The column '% of ADF < 10%' represents percentage of pixels for which the Dickey-Fuller test gave test statistic less than 10% critical value and having p value less than 0.1 for each feature. From the observed results it seems for majority of the pixels the weather features and removal rate have predictive causal relation with transmission rate. Also, for majority of the pixels the feature timeseries are stationary or weakly stationary. Recurrent neural networks (RNN) are a class of artificial neural networks with nodes having feedback connections, thereby allowing it to learn patterns in variable length temporal sequences. A simple RNN have a feedback loop which is associated with hidden state weights. Figure 6 illustrates an RNN unfolded in time. It has hidden states h(t) which changes with time. The inputs and outputs are represented by x(t) and y(t), respectively. The dependency on historical values of the sequence is captured by the relationship between the hidden states. W h , W x, W y are the weights which are learnt through backpropagation during training. However, it becomes difficult to learn long-term dependencies for traditional RNN due to vanishing gradient problem [30] . LSTMs [31] solve the problem of learning long-term dependencies by introducing a specialized memory cell as recurrent unit. The cells can selectively remember and forget long-term information in its cell state through some control gates. We presume there might be long-term complex dependencies of several factors on transmission rate. Thus, LSTM seems a preferred choice for time series modeling. In convolutional LSTM [19] a convolution operator is added in state to state transition and input to state transition. All inputs, outputs and hidden states are represented by 3D tensors having two spatial dimensions and one temporal dimension. This allows the model to capture spatial correlation along with the temporal one. In our model we configured multichannel input such that distinct features can be passed through different channels. Multiple convolutional LSTM layers are stacked sequentially to form a network with high nonlinear representation. The final layer is a 2D convolutional layer having one filter which constructs a single channel output image as the next predicted frame. The model is tested by feeding in input sequence of frames and next frame is predicted which in turn is combined with other features along channel and appended with the input sequence. The new input sequence is fed to the model again to get the next predicted frame. This continues until forecasting completes for a desired time period. "Mean absolute percent error" (MAPE) and Kullback-Liebler (KL) divergence [28] are used to measure the accuracy of the model. The model predicts the transmission rate for a future time period for each pixel which in turn is used to calculate daily new infection cases ΔI + i (t) using Eq. (11) . The removal rate is estimated as running average of previous 3-days and daily removed cases are calculated using Eq. (11) . The active infection cases ( I i (t)) and susceptibles ( S i (t)) are also calculated. Cumulative infection cases ( ∑ ΔI + i (t) ) are calculated by summing up all new infection cases upto a certain day. MAPE of transmission rate is calculated at pixel level for KL divergence at pixel level is calculated for modified transmission rate in the prediction period to measure the dissimilarity of distribution of predicted transmission rate with respect to actual. is softmax function applied after scaling a series in 0 to 1 scale and P(X) is probability distribution of X . Softmax is applied to convert transmission rate as probability distribution across pixels. Since KL divergence measures the dissimilarity between two distribution thus a lower value of it indicates better performance of the model: MAPE is also calculated at grid and country level with respect to cumulative predicted infection cases across the region during the prediction period: The model is constructed by stacking 4 Convolutional LSTM layer sequentially and terminating the network with a Convolutional 2D layer. Figure 7 Illustrates the ConvLSTM model. The final layer is followed by an exponential linear unit as activation. The input and other hidden Convolutional LSTM layers are followed by sigmoid activation. Each Convolutional LSTM layer has 32 filters and kernel size 3 × 3. The input layer is configured to take tensors of size 16 The dataset has a time span of 51 days, out of which data from 42nd to 51st day is used for testing the model and rest for training and validation. Figure 8 illustrates the plot of training/validation loss and training/validation mean squared error (MSE). Both the loss and MSE consistently decreased for training and validation as epochs increased. Table 3 The prediction accuracy and resolution of the proposed model have been compared with few current state of the art epidemic models. Namely, susceptible-infected-removed (SIR), susceptible-exposed-infected-removed (SEIR) [32] , susceptible-exposed-infected-recovered-dead (SEIRD) model [33] and ensemble of ConvLSTM networks [17] . For mathematical models the parameters are estimated using three different constraint-based optimization methodslimited memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS-B) [34] , sequential least square programming (SLSQP) and trust-constr from SciPy [35] . To make them a spatio-temporal model, separate model parameters are generated for each county. Comparison is done based on KL divergence of active Infection cases, county MAPE and country MAPE on cumulative cases. The metrices are calculated for a 10 day prediction period. Table 4 presents the comparison results. Evidently the proposed model (using Conv-LSTM) outperforms most of the state-of-theart methods. There have been several works done on similar lines, however the novelty of the proposed method of forecasting lies both on the approach and the results. There have been several models proposed in the literature employing nonlinear [1] [2] [3] [4] [5] [6] [7] incidence rate. However, the non-linearity is mostly represented using simple predefined functions mainly assuming psychological and saturation effects. A fixed function for modelling incidence rate is not flexible to capture varying environmental factors that may affect the incidence rate. Few time-varying [10, 11] incidence rates are also proposed but they do not consider spatial correlation. Whereas, the proposed method does not model the transmission rate as a predefined function, rather it is represented as a complex spatio-temporal function that is learnt by the neural network through training data. In comparison to these mathematical models the deep neural network can represent the model as arbitrarily complex function and can capture dynamics of influencing factors at much higher resolution. The model is flexible to accommodate any number of influencing factors as input (in separate channels). As shown in the experiments the transmission rate is modeled as a function of weather, population, gender ratio, median age and active cases fraction. Along with that the model inherently captures the spatial and temporal influences. Modeling in this approach has twofold benefits. It improves the forecasting accuracy in many cases (as shown in the experimental results) and it allows to generate explanation of influencing features on the transmission rate (as shown in "Explaining influence on transmission rate"). Deep learning-based spatio-temporal model has already been used to predict spread of epidemic [16] . But this has been applied in a small localized region ignoring many of the external factors. In comparison the proposed method has been applied to create a single neural network model for the whole country (USA) without compromising spatial granularity. The data preparation method automatically divides the whole region into grid and pixels based on latitude and longitude, thus adjacency matrix of different locality (county in this case) is not needed. A Conv-LSTM-based ensemble spatiotemporal model has also been used to forecast epidemic spread [17] . However, this model directly tries to estimate the new infection cases instead of transmission rate and it does not consider external factors like weather, gender ratio. The time series of new infection cases is nonstationary; whereas, transmission rate is usually stationary or weakly stationary. Thus, creating a long-term forecasting model in the earlier scenario is not feasible. Also, modeling transmission rate instead of new infection cases allows the standard SIR method to calculate new infection cases from transmission rate. One of our goal of this study is to understand how different external features are influencing the transmission rate. We expect to find simple interpretable predictive influence relations between transmission rate and different features. One of the ways to find such relations is building an accurate predictive model followed by explaining the predictions in terms of input features. As described in previous sections deep neural networks can model the dynamics of epidemic quite accurately due to its capability of high nonlinear representation. However, high accuracy is a tradeoff against model interpretability. Given the complexity of the Convolutional LSTM network used to model the transmission rate it is nearly impossible to find how each feature is influencing the transmission rate just by studying the weight matrices. Using a high bias predictive model like linear regression or shallow decision tree not only reduces the accuracy but also drops interpretability [36] . Simple models can serve as interpretable models but may fail to capture true relations among features globally. This problem can be solved by building simple local models and drawing local explanations of feature relations. However, there may not be enough data points available or data distribution may be highly skewed in a local region to confidently build a predictive model and draw interpretations on it. Thus, we use the trained Convolutional LSTM model as the global model and draw spatio-temporal local interpretations of it using locally perturbated synthetic data by satisfying a criterion called local fidelity [37] . Local fidelity suggests the explanations should be locally faithful with the model behavior. Local fidelity does not imply global fidelity however global fidelity implies local. To increase interpretability simple surrogate models can be trained with local data as it is expected that the response variable varies with the features almost linearly in a local region. In fact, there is a tradeoff between local fidelity and interpretability that needs to be made. Model agnostic methods perturbs the input features in a local region around a single or a group of datapoints and feeds the model to obtain predicted response variable. This synthetic data is in turn used to train simple surrogate models to obtain local interpretations of global model. There are several existing methods available in the literature to derive local interpretations of a model [37] [38] [39] . Few works also proposed methods to derive global explanations from local interpretations of any black box models [36, 40] . Deriving explanations from a black-box model requires optimization of the following objective function, where G is set of interpretable surrogate models in a locality, f is the global model to be explained, l x is the distribution function defining the locality of x , L is the loss function and Ω g is the complexity of the model g: It is desirable to minimize both Ω g and L . However, in general they are inversely proportional when the spread of l x is large. A very small spread of l x is also not desirable as it will oversimplify g to draw any meaningful explanations in the locality. Thus, a choice of l x is important to derive meaningful interpretations. The locality of x is defined by a threshold distance in all directions from x both spatially and temporally and it is defined by the following tuple, where x s and x t are spatial and temporal components of observation x . d s and d T are spatial and temporal threshold distances from x to the boundary of locality: Figure 10 illustrates spatiotemporal locality of observation x . Spatial locality is bounded by pixels up to d s in all directions from x s such that locality of x s is bounded by a square box of pixels of size (2d s + 1) × (2d s + 1) . No paddings are applied at the edges. Thus, perimeter, defining locality of pixels at the edges of a frame are trimmed. As illustrated in Fig. 10 temporal locality is also defined similarly. Combining spatial and temporal locality the local region of observation x is defined by a sequence of group of pixels with equal time lead and lag from x unless x resides on temporal edge of an input tensor in which case temporal locality is trimmed on the direction of the edge. Perturbated data points are generated by randomly perturbing the pixel values of x following a uniform distribution. Perturbated distribution is calculated separately for each feature. The perturbated sample distribution is calculated as following, where U(k, k � ) is uniform distribution with upper and lower bound as k , k ′ , (l(x)) is standard deviation of all observations in the locality of x and rand randomly selects one sample from two: The spatial features are only perturbated spatially and same values are copied temporally along the corresponding channel. The channels having temporal component are perturbated for different time slice within an input tensor. Each perturbated pixel in a time slice represents a separate feature. Input tensors are constructed using the perturbated values and passed through the blackbox model f to generate a predicted output value. The set of all input perturbated data points of x and the corresponding predicted output values serves as the training dataset for the surrogate model g . Each input channel and the predicted values are normalized to 0 Fig.10 Illustration of spatio temporal locality mean and 1 standard deviation prior to training the surrogate model. Normalization is done to convert the features into same scale so that coefficients of a linear regression surrogate model gives the relative influence of the features on the response variable. Thus, the loss function is defined as following, where the function S constructs the input tensor in the original representation from perturbated samples: Though Z x can be created by perturbing all features of a pixel in each channel within an input tensor, however, in our analysis only a subset of all features are perturbated to produce Z x and to find effect of those features on transmission rate. Other features are kept constant as per the original observation. Intuitively this will explain the effect of the chosen features on the transmission rate in a single pixel area given all other parameters remain constant, including feature values of spatio temporal neighboring pixels. In our analysis Z x is created by perturbing the following features only. Population density, female fraction, median age, weather at 4th, 5th and 6th time lag. Weather includes average daily temperature, 3-day temperature standard deviation and average daily relative humidity. Apart from the weather features the other three features have no temporal component. So, for them the perturbated values are copied temporally in the input tensor during reconstruction. The weather features from 4th to 6th time lag are chosen by assuming the average incubation period of Sars-Cov-2 is between 4 and 6 days. The spatial ( d s ) and temporal ( d T ) distance for defining locality is taken as 1. The perturbated samples for each feature are generated by Eq. (18) . Local interpretations are carried out for each pixel which experienced at least 10 cumulative infection cases on 21st March 2020. The objective is to deduce the influence of aforementioned features on the transmission rate in each pixel given all other parameters remain constant. 250 perturbated input samples are generated for each pixel. The samples are reconstructed in tensor format and fed to the model to obtain the predicted transmission rate and together they form the input output samples. For each pixel a linear regression surrogate model is trained with the training samples. The coefficients of each feature denote the influence on the transmission rate. Figure 11 illustrates the feature influence chart for different pixels in grid 387. We choose grid 387 as it experienced highest number of cumulative infection cases with nearly 10% of total infection cases in USA as of 1st May 2020. Only those coefficients are plotted which have p value < 0.05. The features whose absolute value of median and standard deviation across all days are less than 0.05, are considered unimportant and filtered out from the plot. The counties covered by each pixel in grid 387, which have nonzero population, are stated in average. At grid level population density and female fraction positively impacts transmission rate on daily basis. Median age has minor positive impact on earlier days and negative impact on later days. Figure 11d shows median of influence across all days for different pixels in grid 387. Population density and female fraction have positive impact across all pixels. Median age closely resembles a sinusoidal curve which implies that its influence varies widely across pixels. Figure 12 illustrates the global effects of the features on transmission rate. To generate global interpretations local surrogate models are built for each pixel with 100 perturbated samples. For each feature the distribution of influence values for all pixels with nonzero population is plotted against time. Considering the median of the distribution, population density and female fraction have positive impact across all days; whereas, median age has negative impact. Temperature has minor positive impact, temperature standard deviation has minor negative impact and relative humidity barely has any noticeable impact on transmission rate. From this study it is clear local influence of features at pixel and grid level may widely deviate from global average. This is important as spread of infection is highly skewed regionally such that few hotspots contribute majority of the infection cases. Thus, studying the local influence of features can shed light on the local dynamics of the spread and at the same time global influence charts provides a general idea of the influence on spread. The accuracy or fidelity of explanations with respect to the base model is measured by the R-squared obtained by training all the surrogate models. R-squared measures the explainability of the variance in the input feature set. Higher value denotes better fitment of the model with respect to the data and consequently better faithfulness with respect to the base model in a local region of data. Figure 13a illustrates the histogram of R-squared of all the surrogate models used to generate global explanations. Figure 13b , c shows the histogram for grid 387 and pixel 101 in grid 387, respectively. The median in all the cases lies close to 1 which suggests faithfulness of the surrogate model with respect to the base model in local regions. The base model is already evaluated and proven to accurately represent the real-world dynamics of the spread. Thus, it can be presumed the generated explanations closely approximates the real-world scenario. Classical SIR model assumes a constant transmission rate and it typically predicts a smooth bell curve of active infection cases with respect to time with a single peak. However, transmission rate may vary with respect to multiple external factors including intervention methods like lockdown. A variable transmission rate may result in periodic subsidence and resurgence of the spread of infection and in turn producing multiple peaks of active infection cases along time. Along with this the recovery rate may also change due to multiple intervention methods like enhancing hospital facilities, improving treatment procedure etc. As shown in Eq. (10), recovery rate is very important in achieving disease-free stable equilibrium state. In general, the average removal rate (recovery rate + death rate) over a period should exceed average transmission rate to reach the disease-free equilibrium. Considering the death rate to be constant and quite small compared to recovery rate of Covid-19, the time required to reach the equilibrium state is inversely proportional to the difference between recovery rate and transmission rate. In our experiments we used the trained model to do long-term forecasting of the epidemic with current normal parameters and compared with an "what if" analysis by modifying the removal rate. A 160 days forecasting is carried out for the grid 387. Since weather features barely impacts transmission rate in grid 387 thus the model trained without weather features is used for forecasting. "What if" analysis is done by setting high removal rate to expedite disease-free equilibrium and compared with current normal forecasting by setting removal rate as running average of past 5 days. In "what if" analysis, removal rate is set as per Eq. (10) by setting T = 200 with upper hard limit as 0.2. As removal rate changes daily active infection cases which in turn impacts future transmission rate and due to upper hard limit of removal rate the value of in some pixels is less than upper bound calculated by Eq. (10). From Fig. 14a, b , it is evident that number of active infection cases reduced much faster in the "what if" analysis and most of the pixels hit near baseline state at least once within 200-day period. However rapid periodic resurgence of the disease is seen in this case. As recovery rate has upper hard limit thus in some cases resurgence with high transmission rate resulted in destabilizing disease-free equilibrium. The growth is again quickly dampened due to high recovery rate in future periodic resurgences. This can be empirically explained by the fact that population gets cautious and maintains social distancing with low intermixing when infection cases are high and vice versa. Figure 14c , d suggests there is rapid periodic resurgence of new infection cases in "what if" analysis compared to current normal and multiple short new infection periods are seen. The resurgences in some cases (pixel 85, 101) are stronger compared to current normal. Thus, it is evident, by only increasing recovery rate abruptly, infection spread may not be controlled fully unless other intervention methods are adopted to prevent spike of transmission rate during resurgence periods. Figure 15a , b shows the plot of daily active infections when only pixel 101 and 102 are subjected to modified recovery rate, respectively, and other pixels are set with current 15 "What if" analysis with increased recovery rate for specific pixels in grid 387. Daily active infections with normal running average removal rate for all pixels except a pixel 101 and b pixel 102, where recovery rate is set to a high value in the forecast period normal recovery rate. In both the cases there is a quick dampening of active cases in 101 and 102 pixels and resurgence spike is shorter and weaker compared to Fig. 14b . It is evident there is spatial influence of neighboring active cases on transmission rate. One explanation can be, isolated intervention measures to dampen the spread does not breaks the cautiousness and preventive measures among the population. This makes determining an ideal recovery rate for a region a complex optimization problem. Figure 16 shows active infection cases at grid level quickly reaches baseline in "what if" scenario compared to current normal, but it is not eradicated fully. There are also small periodic spikes in future. The current normal scenario suggests unless strict intervention actions are not taken to reduce transmission rate or recovery rate it is going to take long time to reach the baseline. The trace of new infection cases suggests the trend is quite similar in both the scenarios with more frequent and stronger spikes in "what if" scenario. In current normal scenario the model estimates a total of 379,371 new infection cases and 509,160 removed cases in 160-day period. In "what if" scenario it estimates 349,139 new infection cases and 712,624 removed cases. However, Fig. 16c suggests most of the removal happens in initial 50 days of forecast period due to abrupt increase of removal rate in forecast period. In real world such abrupt increase of removal rate may not be possible. However, on an average if the difference between removal rate and transmission rate can be maintained as per Eq. (10) it is possible to dampen the spread of infection within desired time period. Figure 17 illustrates the plot of predicted vs actual new infection cases for 160-day prediction period for grid 387 and 384. Figure 17a shows the plot of predicted vs actual values for grid 387 and Fig. 17b shows the plot for grid 384. The median error for grid 387 is 455 cases per day and for grid 384 is 18 cases per day. Thus, the long-term prediction of the proposed model follows the actual curve quite faithfully, though the training data was of much shorter duration (41 days) compared to the forecast period (160 days). The study reveals both downward and upward trends of active infection cases in the long-term forecast period and it nearly matches with the actual data. The proposed method can be used to model the epidemic spread of any country with high spatiotemporal resolution as the data preparation method is generic and does not need any country specific geospatial information. It only needs the boundary latitude and longitude of the country. The model can be used for both short term and long-term forecasting. The study also reveals the transmission rate varies widely across different locality and time. Thus, a single parametric model for a wide region and a long period of time will grossly misrepresent the actual dynamics. The transmission rate also varies in complex manner with respect to multiple external factors. Thus, simple functional approximation will also oversimplify it. As per the influence charts the influence of some of the features like gender ratio and median age varies widely across multiple localities in a region in USA. Whereas, population density has consistently positive influence on the transmission rate and weather features have very minor influence. The what-if analysis presents a case study to reveal necessary steps for dampening the spread of the epidemic in a region in USA. The model can be used to calculate recovery rate in a locality to reach disease-free equilibrium in a specific period of time. Though in our analysis we took recovery in strict sense; however, it may not refer to complete recovery. Identification and complete isolation of a patient such that there is negligible chance of further spread of the infection from the patient may also be referred to recovery. Thus, maintaining high recovery rate, rapid and strict isolation of infected patient and intervention methods to reduce transmission rate are the keys to rapid convergence to disease-free equilibrium. A thorough study on the transmission rate of Covid19 in USA revealed several insights. Key influencers are identified. However, there might be other influencers like human mobility, demographics, government interventions etc. On availability of those feature data, proposed methods may be applied to find influences. These methods can also be applied to other countries. Though a threshold condition is derived for disease-free equilibrium, yet it is not straightforward to determine ideal recovery rate to rapidly dampen the infection spread due to complex dependency of transmission rate. A general solution method may be investigated to solve this optimization problem and come up with ideal regional recovery rate. The predictive model can also be enhanced to predict other parameters like recovery rate. Influence of nonlinear incidence rates upon the behavior of SIRS epidemiological models Nonlinear biological dynamics system A delayed epidemic model with pulse vaccination Regulation and stability of host-parasite population interactions: I. Regulatory processes Analysis of a delayed SIR model with nonlinear incidence rate. Cairo: Discrete Dynamics in Nature and Society Dynamic analysis of an SIR epidemic model with nonlinear incidence rate and double delays A simple sis epidemic model with a backward bifurcation A contribution to the mathematical theory of epidemics Qualitative analyses of communicable disease models On the final size of epidemics with seasonality Infectious disease models with time-varying parameters and general nonlinear incidence rate Artificial intelligence forecasting of covid-19 in china Analysis and forecast of COVID-19 spreading in China, Italy and France Modelling and prediction of the 2019 coronavirus disease spreading in China incorporating human migration data Estimation of time-varying transmission and removal rates underlying epidemiological processes: a new statistical tool for the COVID-19 pandemic A deep residual network integrating spatial-temporal properties to predict influenza trends at an intra-urban scale A multivariate spatiotemporal model of COVID-19 epidemic using ensemble of ConvLSTM networks Temperature and latitude analysis to predict potential spread and seasonality for covid-19 Convolutional LSTM network: a machine learning approach for precipitation nowcasting US COVID-19 Daily Cases with Basemap US COVID-19 Daily Cases with Basemap An overview of the global historical climatology network-daily database climate reference network after one decade of operations: status and assessment Herd immunity: a rough guide Investigating causal relations by econometric models and cross-spectral methods Distribution of the estimators for autoregressive time series with a unit root The vanishing gradient problem during learning recurrent neural nets and problem solutions LSTM can solve hard long time lag problems A simulation of a COVID-19 epidemic based on a deterministic SEIR model. Front Public Health Identification and Estimation of the SEIRD Epidemic Model for COVID-19 On the limited memory BFGS method for large scale optimization SciPy 1.0: fundamental algorithms for scientific computing in Python From local explanations to global understanding with explainable AI for trees Explaining the predictions of any classifier How to explain individual classification decisions Model agnostic supervised local explanations Anchors: highprecision model-agnostic explanations The authors declare that they have no conflict of interest.