key: cord-035277-napw1pxe
authors: Paul, Swarna Kamal; Jana, Saikat; Bhaumik, Parama
title: A Multivariate Spatiotemporal Model of COVID-19 Epidemic Using Ensemble of ConvLSTM Networks
date: 2020-11-11
journal: J
DOI: 10.1007/s40031-020-00517-x
sha: 
doc_id: 35277
cord_uid: napw1pxe

The high R-naught factor of SARS-CoV-2 has created a race against time for mankind, and it necessitates rapid containment actions to control the spread. In such scenario short-term accurate spatiotemporal predictions can help understanding the dynamics of the spread in a geographic region and identify hotspots. However, due to the novelty of the disease there is very little disease-specific data generated yet. This poses a difficult problem for machine learning methods to learn a model of the epidemic spread from data. A proposed ensemble of convolutional LSTM-based spatiotemporal model can forecast the spread of the epidemic with high resolution and accuracy in a large geographic region. The feature construction method creates geospatial frames of features with or without temporal component based on latitudes and longitudes thus avoiding the need of location specific adjacency matrix. The model has been trained with available data for USA and Italy. It achieved 5.57% and 0.3% mean absolute percent error for total number of predicted infection cases in a 5-day prediction period for USA and Italy, respectively.

Wuhan city in China initially observed an outbreak of Covid-19 disease caused by SARS-CoV-2. Eventually it became a pandemic and more than 200 countries are fighting hard to contain the infection. One of the best ways to contain the infection is rapid identification of positive cases and isolation. Forecasting regional spread can help identify future hotspots and distribution of infection which would eventually help to take containment measures.

A spatiotemporal epidemic spread model can accommodate both spatial and temporal correlations in data. However, most of the models either require diseasespecific domain knowledge [1] or are too spatially coarse [2] . Deep learning models can learn the dynamics of epidemic spread with high spatial resolution and high degree of accuracy with minimal initial bias due to its capability of high nonlinear representation. Deep neural network-based spatiotemporal models [3] have already been applied to predict epidemic spread. However, this model is experimented on a small localized region and influence of external factors are ignored. Deep learning models also tend to overfit due to its high representational capability. Due to availability of very limited dataset, the problem of overfitting looms large in this case. Thus, modelling of Covid-19 spread in a wide region with high spatial and temporal resolution is challenging.

To address the problem of spatiotemporal prediction of Covid-19 spread in a large geographical region with high resolution, an ensemble of Convolutional LSTM [4]based model is proposed which needs to be trained with multilayer temporal geospatial data, transformed as sequence of frames. Each layer of the geospatial data corresponds to a causal factor that might influence the spread of the epidemic. An ensemble of models helps reduce variance in errors and overfitting on small training dataset. Experimentation is carried out with data of USA and Italy and achieved country-level mean absolute percent error (MAPE) of 5.57% and 0.3%, respectively, on forecasting of total infection cases in 5 days period.

The paper is organized as following. A brief discussion on modelling the epidemic spread and feature engineering is presented in Sect. 2. In Sect. 3, the ensemble of Convolutional LSTM model and performance measurement metrics are presented. Section 4 is about experimental results. The following section concludes the paper.

Disease spread is a complex dynamical system, and numerous factors contribute to the dynamics of spread making it non-stationary. Covid-19 is no different. Geographical location, weather conditions [5] , human mobility [6] , and population statistics might be some of the impacting factors changing the dynamics of the spread. Epidemic spread is correlated in time as well as spatial dimensions. However, it may be spatially autocorrelated in a small localized region but not across wide regions. Thus, a large geographic region is divided into relatively smaller grids and model is trained with samples drawn from local distribution of infection cases. The objective of the model is to forecast new cases of infection on daily basis in different regions across a country which can be added up to calculate total cases of infection.

All the observations in the dataset are mapped to a spatial region bounded by defined latitude and longitude. The region is geospatially divided in M 9 N grids of equal sizes bounded by calculated latitudes and longitudes. Figure 1a illustrates a grid bounded by latitudes and longitudes. The box represented by the dotted line is called as frame. The frames have overlapping areas in all 4 directions. The overlap allows flow of spatial influence from neighbouring grids to another. Each frame is in turn divided into L 9 L pixels which includes the overlapping area. Each pixel represents a bounded area in geospatial region. The values in each pixel are mapped to certain feature in the bounded geospatial region. Separate frame matrices are constructed for each feature and concatenated through channels. For example, new infection count and population are two features and they represent two separate L 9 L matrices in a frame concatenated across a third axis. Each pixel in the infection count matrix contains the count of new infections (DI) in the pixel area in a day. Infection count is distributed both in spatial and temporal dimensions. To reduce the skewness, the infection count in a pixel is log transformed and normalized in 0-1 scale. Considering R0 factor of Covid-19 between 2 and 3, it is calculated that 60% of the population (P) in an area needs to get infected to attain herd immunity and reduce further spreading [7] . Similar to the SIR model [8] , the total population P is compartmentalized into susceptible (S) and infected/recovered/deceased (I) group. Susceptible population at any day is calculated as 0:6P À I: A pixel value is calculated as ln DI þ 1 ð Þ=ln S þ 2 ð Þ. Total population is distributed spatially in similar fashion, and it is assumed time invariant within a short interval. Pixel value of population matrix is calculated as

Þ . Each frame is represented as tensor of dimension T 9 L 9 L 9 C, where T is the total time span and C is number of channels or features. As shown in Fig. 1b , each training sample in a frame is Fig. 1 a Illustration of overlapping frames obtained by spatially dividing a geographical region. The bold lines represent latitudes and longitudes which separate the grids. The box with dotted line represents the overlapping frame and it is represented as an image for training the model. Each grid is divided in certain number of pixels. The margin refers to number of pixels in the overlapping region. b Illustration of sequence of images in a frame. t-0 is the most recent image. X train , Y train are the training samples and X test , Y test are testing samples generated by sliding a time window size of W ?1 by 1, leaving behind a test case sample of time window size of W 0 in the most recent period. Number of training samples in a frame can be calculated as T À W 0 À W À 1. Thus, total number of training samples S train for all frames can be calculated as S train ¼ T À W 0 À W À 1 ð Þ Ã M * N. The forecasting problem is framed as supervised learning problem. Given a sequence of observed matrices of spatial data as images X 1 ; X 2 . . .X t , the final objective of the model is to predict the next image X tþ1 . The training samples are divided into input sequences and output sequences each of length W. The input sequence is time lagged images of output sequence. The model forecasts the normalized log transformed new infection count in each pixel in a frame for each timestep. Thus, the output frame consists of only 1 channel. The input training dataset (X train Þ can be represented as a tensor of size S train Â W Â L Â L Â C and the output dataset (Y train ) as S train Â W Â L Â L Â 1. For training, the input sequences are selected from all frames having nonzero total infection count. Figure 1b illustrates the sequence of images in a frame. The image t-7 to t-3 represents an input training sequence (X train Þ of length W. The output image sequence (Y train ) for this training sample is t-6 to t-2. Other training samples are generated by sliding the window W backwards in time by 1. The most recent images t-0 and t-1 represent the test output images (Y test ), and immediate sequence of images t-6 to t-2 is the test input sample (X test ). The test set X test is represented by a tensor of size M * N ð ÞÂW Â L Â L Â C and Y test by M*N ð ÞÂW 0 Â L Â L Â 1.

The model comprises of a Convolutional LSTM network [4] configured to take multichannel input such that distinct features can be passed through different channels. Multiple convolutional LSTM layers are stacked sequentially to form a network with high representational capability. The network terminates with a 3D convolutional layer having one filter which constructs a single channel output as the next frame prediction. A single model may be prone to overfitting on training dataset and loose stability in terms of prediction made. Creating an ensemble of diverse models intended to solve the same task, and combining the predictions made by them typically improves test accuracy and stability [9] . We used bootstrap aggregation [10] to create an ensemble of models. 60% random samples are drawn with replacement from the original training dataset, and an ensemble of five models is trained individually. During prediction the output of each of the models is weighted as per following equation.

o is output of the ensemble, o n is output of the model n, E is set of all models in the ensemble, I n train is number of infected patients in the training samples of model n, MSE n is mean squared error of model n on validation dataset, and r is softmax function. The weights are proportional to the amount of positive cases used for training the model and inversely proportional to the validation error.

During testing the model is given a sequence of most recent frames as input, and the next frame is predicted. The predicted frame is temporally appended to the input sequence of frames and fed to the model again to obtain the next predicted frame. This continues until required number of future frames are predicted. The accuracy of a model is tested with the metric ''mean absolute percent error'' (MAPE) and Kullback-Liebler (KL) divergence [11] . The pixel values are transformed to DI and summed up cumulatively to calculate total infection cases I, up till a specific day. MAPE is calculated at pixel level for total infection cases at the end of prediction period and averaged. The pixels with 0 susceptible population count are ignored while calculating MAPE. Pixel MAPE is calculated as per Eq. 2, where G p is set of all unique pixels in all grids such that the frame for each corresponding grid has nonzero total infection count, W 0 is prediction time period, W 00 ¼ T À W 0 is total time period in training set, p is a pixel from a set of unique pixels in the total region having nonzero actual susceptible population, p i is predicted pixel value on i th day, S i p is susceptible population at pixel p on i th day calculated from p i and S iÀ1 p , DI i p is actual new infection count at pixel p on i th day, and N p is total number of pixels p in the region.Î p and I p are total predicted and actual infection cases, respectively.

KL divergence at pixel level is calculated for total infection cases at the end of prediction period to measure the dissimilarity of distribution of predicted infection cases with respect to actual. r is softmax function applied after scaling a series in 0 to 1 scale, and P X ð Þ is probability distribution of X. Softmax is applied to convert total infection cases as probability distribution across pixels.

MAPE is also calculated at country level with respect to total predicted infection cases across the region during the prediction period.

Experiments have been carried out to predict the future new infection cases in Italy for a period of 5 days and 10 days and in USA for a period of 5 days and 8 days.

Data have been collected from Harvard Dataverse [12] [13] [14] .

For USA the data collection period is '2020-03-09' to '2020-04-08' and for Italy it is '2020-02-05' to '2020-04-10'. Test data period for Italy data is '2020-04-01' to '2020-04-10' and for USA it is '2020-04-01' to '2020-04-08'. Figure 2a , b shows the region of USA and Italy which has been divided into grids. The length W of each training input sequence is taken as 10 days. For both the countries, the frames containing at least a single Covid-19 infection case are considered for training and testing the model. Each frame in turn is divided in 16 9 16 pixels with an overlap Margin of 4 pixel. The model consists of ensemble of 5 Convolutional LSTM networks. For Italy each network contains 4 hidden layers with sigmoid activation. The output layer is a Convolutional 3D layer with exponential linear unit as activation. The models are trained for 30 epochs with mean squared error as loss function. The input and hidden layers have 32 filters. The input layer is configured to take images of size 16 9 16 9 2. The second channel is fed with normalized population data. The region in USA is approximately 13 times than Italy, and the distribution of Covid-19 cases in USA is geospatially highly skewed. Thus, grids are divided in four equal sections by latitude and longitude with each section containing 9 9 15 grids. A set of 4 heterogenous ensembles are trained for each of the 4 sections. The configuration of networks is same as that of Italy except they contain 2 hidden layers and each network is trained for 20 epochs. Table 1 shows the performance of the models in terms of KL divergence and MAPE. For both USA and Italy, low KL divergences state that the predicted geospatial probability distribution of total infection cases nearly matches with the actual probability distribution. The pixel level MAPE for Italy stays below 30%. For USA in 8-day forecasting period MAPE is 44% as there are many pixels in USA with low total patient count. A slight deviation in the prediction for these pixels shoots up the MAPE. Country level MAPE is low for both Italy and USA. Figures 3a and 4a shows predicted versus actual total Covid-19 cases for a period of 8 and 10 days in USA and 

An ensemble of Convolutional LSTM-based spatiotemporal epidemic spread model has been proposed for shortterm forecasting of Covid-19 spread. Experiments done on data obtained for USA and Italy reveal acceptable prediction accuracy with high resolution. Since the model has option to fed in any number of external features, we are experimenting with multiple external features that might influence the spread. This might help to find important 

Optimal vaccine distribution in a spatiotemporal epidemic model with an application to rabies and raccoons

Demographic variability, vaccination, and the spatiotemporal dynamics of rotavirus epidemics

A deep residual network integrating spatial-temporal properties to predict influenza trends at an intraurban scale

Convolutional LSTM network: a machine learning approach for precipitation nowcasting

Temperature and Latitude Analysis to Predict Potential Spread and Seasonality for Covid-19

Modelling and Prediction of the 2019 Coronavirus Disease Spreading in China Incorporating Human Migration Data

Herd immunity: a rough guide

Qualitative analyses of communicable disease models

Why M Heads are Better Than One: Training a Diverse Ensemble of Deep Networks

Bagging predictors

On information and sufficiency

US COVID-19 Daily Cases with Basemap

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Log transformed Predicted versus Actual total Covid-19 cases after a period of 10 days in each pixel in Italy. b Log transformed Predicted versus Actual total Covid-19 cases after a period of 8 days in each pixel in USA