key: cord-0056065-hmfjkohp
authors: Bhunia, Gouri Sankar; Roy, Santanu; Shit, Pravat Kumar
title: Spatio-temporal analysis of COVID-19 in India – a geostatistical approach
date: 2021-02-06
journal: Spat
DOI: 10.1007/s41324-020-00376-0
sha: 903da82c34a8cc7bc26401988388d9e2c782a293
doc_id: 56065
cord_uid: hmfjkohp

Coronavirus (Covid) is a severe acute respiratory syndrome infectious disease, spreads primarily between human beings during close contact, most often through the coughing, sneezing, and speaking small droplets. A retrospective surveillance research is conducted in India during 30th January–21st March 2020 to gain insight into Covid’s epidemiology and spatial distribution. Voronoi statistics is used to draw attention of the affected states from a series of polygons. Spatial patterns of disease clustering are analyzed using global spatial autocorrelation techniques. Local spatial autocorrelation has also been observed using statistical methods from Getis-Ord G(i)(*). The findings showed that disease clusters existed in the area of research. Most of the clusters are concentrated in the central and western states of India, while the north-eastern countries are still predominantly low-rate of clusters. This simulation technique helps public health professionals to identify risk areas for disease and take decisions in real time to control this viral disease.

A New Coronavirus-nCoV' was identified in December 2019 and subsequently renamed SARSCoV-2 in Wuhan, Hubei, China, resulting in extreme acute respiratory syndromes [1] . Covid is a pandemic that is actively expanding throughout the world and a unique challenge for the community's healthcare, economy and lifestyle. Countries are grappling with many tactics in order to minimize the spread of Covid: ban collection, close schools, stop transportation, lock towns, enforce curfew, and seal places, but unable to contain it effectively [2] . The time is required to locate the risk assessment on a site basis to take prompt preventive measures. Globally, there are 48,93,195 cases of coronavirus, while the death toll is 3,22,861. Taking into account the updates of the Ministry of Health on Wednesday (May 20, 2020), India received a total of 1,06,750 COVID cases which include 61,149 active cases, 42,298 cure and 3303 deaths [3] . In the last 24 h, there have been 5611 new cases and 140 deaths. The rate of recovery is 39.62%.

A key element of epidemiologic research, the geographical distribution of the disease, is demonstrated by the importance given to the ''person, place and time'' descriptor of health events in the classical epidemiology textbooks [4] . Geographical information systems (GIS) have today revolutionized these spaces-which, in simple terms, give the ability to view spatial or geographic information in a meaningful way, be it interactive maps or other infographics. There are numerous uncertainties in the Covid pandemic, many of them have a spatial component that contributes to the epidemic being interpreted as geographical and technically mappable [5] . In the battle against Covid in India, there have been limited Covid risk maps and application of Covid spatial epidemiology [6] . For these purposes, with the emergence of Covid as a global pandemic, the use of geospatial and statistical methods has become extremely important. Statistical modelling and spatial epidemiology in small areas have been developed in order to solve problems where disease clusters and hotspots are located. Some of the principal spatial techniques explored by Robertson [7] are spatial autocorrelation, spatial time interactions, hotspots and clusters, used throughout emerging infectious disease research.

Advances in geostatistical methods have provided for substantially improved efficiency in the processing and analysis of complex georeferenced data with multiple variables on different geographical scales, providing epidemiologists with new instruments to incorporate space and place in their study [8, 9] . The public health authorities use the disease prevalence maps as a guide to monitor and prevention programs to classify areas of excess and their possible causes (e.g. exposures to the environment or socio-demographic factors). Three major hurdles lie in interpreting and analyzing the choropleth maps: (2) visual bias due to including health data within administrative units of large varies sizes and types and (3) spatial support mismatch for the occurrence of disease and explanatory variable data that prevents direct use of correlational research, (1) extremely unsafe rates which usually occur for sparsely populated regions and/or less frequently observed Covid.

Geostatistical algorithms have been developed to filter local small-scale variations on cluster maps that enhance regional trends on a larger scale [10, 11] . Their computer requirements and the underlying assumptions about spatial patterns and distribution of risk values differ greatly in these methods. An important exploratory technique in scientific inquiry continues to be cluster analysis [12] . Spatial cluster detection depends on the geography of the activities which requires the correct and meaningful treatment of space and spatial relations, combined with the location and event attributes observed. It has to date involved the use of specific structuring and accounting methods and techniques for the distance, the outskirts, the contiguity, geographical irregularity and so on. Understanding the distribution of Covid cases in India with the use of geostatistical analysis approach, will help inform Covid control programme at smaller scale. The specific objectives are to use of spatial auto-correlation technique to analyze Covid spatial pattern and identify clusters with statistically significant hotspots of the disease. Present study focused on basic geostatistical approaches to deal with Covid clustering pattern in India.

The incorporated in this study includes all information reported by Government of India up to the latest of 21st May 2020 [3] . This report considers all states and Union Territories as well (Fig. 1) . In this report, the confirmed Covid cases and death due to Covid along with the transmission types is considered. The skewness of the Covid incidence data is measured in terms of the third moment of the mean of the distribution. If the distribution is symmetric, the skewness is zero. The kurtosis describes the extent of peak of the distribution, measured by fourth moment of the mean. The distributions with kurtosis lower than 3 are known as platykurtic.

Voronoi polygons are created so that every location within a state is closer to the affected state than any other non-affected states. After the polygons are created, neighbours of affected states are defined as any other sample locations whose boundary shares a border with the chosen affected states. The Voronoi map tool provides a number of local statistics (mean, median, entropy, inter-quartile range) in which polygons can be assigned or calculated [13] . The status value is the average value that the states and their neighbour's states calculate which is used for further processing and analysis.

Directional distribution, namely the standard deviation ellipse (SDE), was used each year to calculate the directional pattern and to provide compactness and orientation information on the dispersion of the infected Covid. The standard distance measurement in x-and y-directions is a common way of measuring the pattern for a certain group of areas [14] . Each of these measures describes an ellipse axis that covers the distribution of characteristics [15] . The SDE determines the default x-coordinates and y-coordinates of the centre in order to determine the Ellipse's axes. The ellipse enables one to see if the distribution of features is elongated and therefore has a particular orientation. For both the infected Covid, we have used a standard deviation which account for approximately 68% of all input variables [16] . In order to compare the spatial patterns of the Covid infected and local source, a series of additional measurements and data including an axial ratio and coordinates of each ellipse were collected for six days interval.

The ArcGIS 10.0 statistical toolkit for global autocorrelation (Moran's I) and Getis-Ord G i * were used in the identification of statistically significant Covid clusters for the various states in India. The statistical technique of the Moran's I evaluates the spatial autocorrelation of Covid cases in geographical areas clusters where the value of Moran's I close to zero means that the illness is spatially random; a good value suggests spatial clusters [17] . A statistically significant estimation of Moran's I (p \ 0.05, z score * 1.96) suggests that neighbouring districts have similar Covid cases under the null-hypothesis that Covid distribution on a regional scale is absolutely spatially random to determine whether the spatial trend is clustered, dispersed or random. The optimized high-low cluster produces using the Getis-Ord G i * statistics of statistically relevant cluster (e.g., states with high Covid cases). Spatial outliers comprise high-low (a high value in a low-value states) and low-high outliers (a low-value value in a high-value states). Getis-Ord G i * value less than 1 demonstrates positive space autocorrelation whereas it suggests a value greater than 1 point to negative spatial autocorrelation [18] .

To identify the spatial relationship, fixed distance band is used in which each state boundary is analysed within the context of neighbouring state boundary. Neighbouring state boundary outside the specified critical distance receive the number of cases and extent the influence on computation for the affected/non-affected states. Neighbouring state boundaries outside the critical distance receive a weight of zero and have no influence on the affected state computation. This method measures a z-score and P-values that are statistical tests to demonstrate whether a null hypothesis can be rejected or not. For statistically significant positive z-scores, the larger the z-score is, the more intense the clustering of high Covid affected states (hotspot). For statistically significant negative z-scores, the smaller the z-scores is, the more intense the clustering of low Covid incidences (cold spot). Moreover, very high positive or negative z-score are associated with the very small P- values which indicate it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by the null hypothesis. A confidence level of 99 percent we have selected which indicated that we are unwilling to reject the null hypothesis unless the probability that the pattern was created by random chance is less than a 1 percent probability.

Areal interpolation is a wide variety of methods that can measure the cumulative attribute of a unit system (in this case, newly created polygons), based on the attribute data structure of another, spatially incongruous structure (in this case, the original polygons). The initial units for which the characteristic is defined are also called source units and for which the characteristic has to be measured, target units are called objective units [19, 20] . The proposed pycnophylactic areal interpolation algorithms [21] are based on different assumptions regarding the underlying distribution of Covid-2019 cases, relies only on the databeing estimated. The lattice spacing is defined to estimate the point covariances of Covid incidence, each state is overlaid with a square lattice, and a point is assigned to each intersection in the lattice. The number of cases in the source units is spatially redirected to the target units using as a weight the area that each source unit contributes to the target area [22] . It is an improvement over the area-weighting method because it does not assume a homogeneous distribution of cases, and the continuous surface eliminates sharp transitions in Covid case estimates across state boundaries.

The predictive accuracy is measured using the root mean square error (RMSE) based on the interpolated value y^j in 5-km distance as follows:

Where, j represents the number of intersection units whose number of units is J. In this case, the space autocorrelation distance between two models represents an effect. Our findings show that the spatial autocorrelation in the field is necessary to take into account.

All statistical data are processed using Microsoft Excel version 11.0. Continuous variables are conveyed as the mean, standard deviations or medians, skewness, kurtosis and range as appropriate (Table 1) . Results showed mean value of Covid cases are dramatically increased from 30th January, 2020 to 21st January, 2020. In January, the mean incidence of Covid cases calculated as 0.03; whereas, on 21st May, 2020, the estimated average Covid incidence was 836.84. High skewed of Covid distribution was observed from 30th January, 2020 to 16th March, 2020. The skewness of the dataset varies between 1.50 on 22nd March, 2020 and 5.83 on 30th January, 2020. All data were positively skewed, indicated that the size of the right tail is larger than the left tail. Kurtosis is associated with the back, shoulder and peakedness of the distribution. The value of kurtosis ranged between 4.44 and 35.03. The positive value indicated platykurtic distribution. Sample size-weighted measures are beneficial since we expect a survey variable to better represent that of the population as the sample size increased. To account for this, measurements from large samples are given a higher weight than those from smaller samples.

The SDE method establishes a new pattern with an intermediate centre elliptic polygon for all states. Such output ellipse polygons have Covid incidence value of two normal lengths (long and short axes); ellipse orientation ( Table 2 ). The direction is the rotation of the long clockwise axis from noon. During the period between 10th March 2020 and 09th April 2020, 27th April 2020 and 09th May, 2020 two ellipses have been generated. Figure 2 shows the series of directional distributions of the Covid infected patients in each week (from 30th January, 2020 to 21st May, 2020). Plotting ellipses for Covid outbreak during the study period may provide insight of disease spread which may be useful for in deploying mitigation strategies. The districts are spatially regular (so they are the mostly concentrated in the middle and become increasingly dense towards the periphery).

Their shapes are dissimilar from one week to another; the ellipse is generally oriented along the west-east direction. From 30th January 2020 to 16th March 2020, both the long and short axes became larger and oriented towards west-east, indicated most of the cases are found in west and eastern states of India and concentrated dispersedly within the states (Fig. 2) . Since 10th March, 2020 a small ellipse are oriented towards north-south direction, and both the long and short axes are smaller which means that the standard deviations of the ellipses were decreasing and Covid cases were distributed in some north and southern states (Fig. 2c) . On 16th March, 2020, the northsouth SDE were shifted towards eastern direction (Fig. 2d) . on 22nd March, 2020 three SDE were generated having an orientation towards south-west to north-east direction with large axes, north-east to south-west direction with small axes and a very small axes of east-east-south to northnorth-west direction. This indicates that Covid cases were distributed in most the states in India and mostly concentrated in the north and north-west direction. Since 28th March to 04th April, 2020, two SDEs were generated with south-west to north-east and south-east to north-west direction. However, the size of south-east to north-west orientation ellipse is comparative smaller which indicated maximum concentration of the disease in this particular state. From 6th April, 2020 to 15th April 2020, results showed most of the cases were concentrated in the central and southern states of India. Since 15th April 2020 ( Fig. 2m-o) , the SDE are shifted towards west and south and oriented towards south-west to north-east, indicated that the number maximum concentration of cases is higher in this region.

The Global Moran's I method is an inferential statistic, meaning that the study' findings are often interpreted in accordance with its null hypothesis. The null hypothesis of the Global Moran's I statistics suggests that the analyzed attribute is randomly distributed between the states. Two special cases of the general cross-product data measuring spatial auto-correlation. Table 3 shows a description of the findings of the spatial autocorrelation data calculated through Moran's I and Getis-Ord Gi* statistics on weekly infected Covid. There were statistically relevant findings from global Moran's I test (z scores above 1.96) and suggest spatial heterogeneity. Also, statistically important were the global Covid autocorrelation figures between 30th January 2020 and 21st May 2020. Results show that the change in the Covid distribution's spatial autocorrelation with intervals between 0.326 and 0.662 was relatively unstable. A positive value of Moran's I suggests positive spatial autocorrelation which means a combination of high values and low values. The largest Moran's I value indicated the strongest spatial autocorrelation of Covid affected states. In this study, the distance band was 10 km and the spatial clusters were further studied. In the absence of the norm of the data, an exponent for the transformation of the data into a normal distribution was defined in the Box-Cox transformation. The autocorrelation among the states were lower from 30th January to 10th March, 2020. Since 16th March to 15th May, higher value of Moran's I were calculated with high Z-score value and significant P value at 99% confidence level; however, the estimated value of Moran's I is less on 21st May 2020. The transformed data can also remove the effect on the spatial cluster analysis of extreme values. The Getis-Ord G i * tool evaluates each Covid infected state and compares the local situation with the global situation in the neighbouring states. The value derived through Getis-Ord Gi* statistics, z-score and P-value are represented in Table 3 . These results indicate that there was positive spatial auto-correlation. Results showed all the z-score values were significant at a \ 0.01 level. Hence, we could reject the null hypothesis. The spatial distribution in the data set of high and/or low values of Covid was spatially clustered more than expected if the underlying spatial processes were changed. The higher value of Getis-Ord Gi* and z-score was calculated for 16th March, 2020 and from 15th April, 2020 to 05th May, 2020 (Table 3 ). The calculated values of Getis-Ord G i * from 09th May to 21st May, 2020 represented in Table 3 . The lowest value of Getis-Ord G i * observed from 30th January to 04th March, 2020.

Our primary aim in this analysis was the detection of Covid hotspots using statistical methods for potential intervention. By considering number of Covid cases for each state and the average measurement, a prediction surface was produced for the value of the Gaussian variable at all states in the data domain. In this interpolation the empirical co-variances were adjusted within the 90% confidence interval. Exponential model appears to be fit the data very well; most of the covariances fall within the confidence intervals. The searching neighbourhood of the predicted value of 0.0006119 was fixed for the fifth-grade obesity rates along with the smoothing factor of 0.2. The details of the model parameters (lattice spacing, lag size, mean, major range, RMSE, and average standard error) were illustrated in Table 4 . The lattice spacing of the model was varied between 1.6417 and 1.8808 during the study period. The lattice spacing parameter specifies the horizontal and vertical distance between each central location of the infected states. The partial sill was increased with the change of time. This indicated that the predicted Covid incidence value of any state at that location had about 33 percent change of being obese. The mean value of the study Covid model was varied between -0.0027 and 14.3895. The negative mean value was calculated for 30th January, 2020, 10th March-28th March, 2020 and 21st April, 2020. The estimated root mean square was varied between 0.7996 and 7.289 during the study period. This indicates that ideally good prediction of surface as it was close to '1'. The value of Root Mean Square Error (RMSE) was gradually increased in the study period, except on 15th May, 2020 (7.289). The average standard error in the study area was gradually increased, except on 09th April, 2020. Figure 3 portrays the spatial distribution of areal interpolation of Covid cases during the period between 30th January, 2020 and 21st May 2020 for the re-aggregation of Covid cases to downscale or upscale within the state boundary. The areal interpolation tool predicts the average value of Gaussian distribution (with prediction standard April, 2020, Covid VHR were mostly concentrated in the central and west of the nation and HR areas were shifted towards east and disseminated on the north and south of the India. Since 21st April to 03rd May, 2020, the VHR zones were identified in the central and south-west of India. HR zone were extended towards the entire eastern zone of India. Since 03rd May 2020, the pattern is almost similar; however, low risk areas of Covid were identified in the Chhattisgarh and Odisha states. Moreover, the HR zones were extended towards entire south and extreme north-west of India. [23] . Few previous ecological studies have been conducted in India concerning Covid problems and their links to risk factors. However, very limited number of studies was conducted on Covidspatio-temporal pattern in India. The present study of India's leading health issues' spatial clustering could help to understand the spatial epidemiology of Covid for the implementation of regional prevention and control strategies by health departments. Traditional statistical analyses with no displays or graphics restrict their usefulness, particularly to advice policymakers on priority issues. Data are encoded as graphics or images and can be displayed more intuitively. Therefore, our study is focused on the visualization of the Covid distribution created by GIS and geostatistical technique. Spatial autocorrelation analysis and spatial cluster analysis were carried out to identify spatio-temporal pattern of Covid clusters.

The use of infectious disease of spatial analytics techniques is not new [24] [25] [26] . For cluster analysis of regional health issues, spatial autocorrelation measurement is useful [27, 28] . The positive values of Moran's I and Getis-Ord G i * statistics indicate spatial autocorrelation as cluster (clustering of similar neighbour states). A significant positive spatial autocorrelation was observed in our analysis and it is corroborated with earlier results [29] . The neighbours of each state boundary were established by the edge contiguity, which gives weights to neighbours with edge sharing, and to all the others. In view of the z scores: 5.43 and 8.33 for January and April respectively, suggest that the observed clustered trend is likely to have a probability of less than 1 percent. A complete randomness hypothesis is rejected, as there are high Covid cases in neighbouring places in an urban and peri-urban area. This indicates that Covid cases are spatially clustered throughout metropolitan area and peri-urban areas. Identifying these hotspot Covid areas may be taken care and supervise regional Covid prevention programmes. This will help policymakers analyse spatial risk factors to figure out the way to move forward of health care strategies for health services preparation and implementation. Areal interpolation has been demonstrated to be a promising method for defining endemic disease clusters [30] [31] [32] [33] .The projection of areal interpolation moves the complicated structure from high dimensional regions into lower dimensional clusters, which is essential to cluster endemic disease areas based on the neighbourhood relations. The integration of local interpolation and GIS is designed successfully to produce dynamic visualisation, which in turn helps public health officials to decide Covid management in a timely manner. Most of the infection are concentrated in the central and southern states of India. Spatial distribution pattern of Covid cases are significantly clustered and identified in the north-central and extreme north of India. Hotspot areas are mainly distributed in Maharashtra, Telengana, Karnataka, Gujrat, Madhya Pradesh and Rajasthan and cold spots areas are distributed in the north-east of India.In a future, spatial autocorrelation analysis help to understand the temporal dependence.

There are still some limitations of this analysis. In the present study, we do not include migratory population and others socio-economic parameters. In order to assess the associations between Covid and its potential factors in India, further data collection at local level needs to be undertaken. Second, this is a very short period of study (i.e., three months). For evaluating spatial and temporal patterns of the Covid pattern additional information is needed for a longer duration or more comprehensive chronological incidence.

In this study, GIS and spatial modelling have been used to analyze and show the spatial patterns of Covid through many epidemiological researches. The techniques Moran's I and Getis-Ord G i * have been used spatial pattern and distribution pattern. The research findings indicated that the geographical distribution of Covid in India is heterogeneous, especially concentrated in the central and west of India. The findings of Covid hot spots in India (Maharashtra, Madhya Pradesh, Telangana and Rajasthan) with areal-based interpolation will help the provincial health officers to improve their remedial action and to establish potential strategies for better management of disease. Such spatial and temporal clusters can also try to empower and endorse highly efficient, locally adapted procedures for the highly spatially heterogeneous Covid disease. Similarly, our study indicates, by focusing where and when available public health resources should be focused, that spatial and temporal analyzes of population-based surveillance data for diseases will aid in manage viral diseases such as Covid.

Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Geographical tracking and mapping of coronavirus disease COVID-19/severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic and associated events around the world: how 21st century GIS technologies are supporting the global fight against outbreaks and epidemics

Modeling geographical spread of COVID-19 in India using network-based approach

Georeferenced data in epidemiologic research

Geo-mapping of COVID-19 risk correlates across districts and parliamentary constituencies in India. Harvard Data Science Review

Spatial analysis and GIS in the study of COVID-19. A review

An overview of spatial analysis of emerging infectious diseases

Small-area spatial statistical analysis of malaria clusters and hotspots in Cameroon

Spatial statistics and influencing factors of the novel coronavirus pneumonia 2019 epidemic in Hubei Province, China. Research Square

Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging

Clustering analysis of countries using the COVID-19 cases dataset

Geostatistical COVID-19 infection risk maps for Portugal

Resamplesmoothing of Voronoi intensity estimators

Statistical analysis of spherical data

Spatio-temporal analysis to identify determinants of Oncomelaniahupensis infection with Schistosoma japonicum in Jiangsu province

A GIS model predicting potential distributions of a lineage: A test case on hermit spiders (Nephilidae: Nephilengys)

Spatio-temporal distribution and hotspots of hand, foot and mouth disease (HFMD) in northern Thailand

Spatial processes: Models and applications

The linkage of data describing overlapping geographical units

The local nature of a national epidemic: Childhood overweight and the accessibility of healthy food

Smooth pycnophylactic interpolation for geographical regions

Spatial interpolation methods: A review. The American Cartographer

Status update

Spatial autocorrelation analysis of health care hotspots in Taiwan in 2006

Rapid surveillance of COVID-19 in the United States using a prospective space-time scan statistic: Detecting and evaluating emerging clusters

Geographic differences in COVID-19 cases, deaths, and incidence -United States

Análiseespacialsobre a dispersão da covid-19 no Estado da Bahia

COVID-19 en México: Comportamientoespacio temporal y condicionantessocioespaciales, febrero y marzo de 2020 Posicio´n

Distribution and trend analysis of COVID-19 in India: Geospatial approach

Processing aggregated data: The location of clusters in health data

Mapping the incidence of the COVID-19 hotspot in Iran -implications for travellers

Modelling and predicting the spatio-temporal spread of coronavirus disease 2019 (COVID-19) in Italy

Spatial prediction of COVID-19 epidemic using ARIMA techniques in India

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Conflict of interest There is no conflict of interest between the authors.