key: cord-0982364-c944zmxt authors: Gupta, Amitesh; Banerjee, Sreejita; Das, Sumit title: Significance of geographical factors to the COVID-19 outbreak in India date: 2020-06-17 journal: Model Earth Syst Environ DOI: 10.1007/s40808-020-00838-2 sha: a8c8bd9ec9fedb06c261d308c79c94f0c004cc53 doc_id: 982364 cord_uid: c944zmxt Recently, the large outbreak of COVID-19 cases all over the world has whacked India with about 30,000 confirmed cases within the first 3 months of transmission. The present study used long-term climatic records of air temperature (T), rainfall (R), actual evapotranspiration (AET), solar radiation (SR), specific humidity (SH), wind speed (WS) with topographic altitude (E) and population density (PD) at the regional level to investigate the spatial association with the number of COVID-19 infections (NI). Bivariate analysis failed to find any significant relation (except SR) with the number of infected cases within 36 provinces in India. Variable Importance of Projection (VIP) through Partial Least Square (PLS) technique signified higher importance of SR, T, R and AET. However, generalized additive model fitted with the log-transformed value of input variables and applying spline smoothening to PD and E, significantly found high accuracy of prediction (R(2) = 0.89), and thus well-explained complex heterogeneity among the association of regional parameters with COVID-19 cases in India. Our study suggests that comparatively hot and dry regions in lower altitude of the Indian territory are more prone to the infection by COVID-19 transmission. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40808-020-00838-2) contains supplementary material, which is available to authorized users. Coronavirus disease 2019 (COVID-19) already considered as a global pandemic is rapidly spreading across the world and significantly affecting many countries (Singhal 2020; Asyary and Veruswati 2020) . This outbreak of a novel coronavirus (SARS-CoV-2) disease began in December 2019 in Wuhan, Hubei Province, China (Gorbalenya 2020; Ma et al. 2020; Wu et al. 2020) . By March 25, 2020, the disease had rapidly spread from Wuhan to 196 countries, located in different parts of the world Xu et al. 2020) . As of April 28, 2020, there have been a total of 3.12 million confirmed cases from all around the world. This contact transmissible disease has an average incubation period from 6 to 14 days (Tosepu et al. 2020) . Fever, respiratory disorder, coughing and shortness of breath are some of the early symptoms; while in the acute stage, it can even lead to death (Holshue et al. 2020; Perlman 2020; Tosepu et al. 2020) . According to WHO, the first infected case in India was reported on Jan 30, 2020. Later, around March 4 onwards, it turned into a major outbreak. Till April 27, Maharashtra was the leading state with a total number of 8590 cases; while the whole country recorded a total of 29,458 cases. Social distancing is the only measure that is adopted due to the lack of vaccine. SARS-CoV-2 can be transmitted through various bio-aerosols, large droplets or direct contact with secretions similar to the influenza virus (Li et al. 2005; Qi et al. 2020) . Virus transmission can be influenced by several geographical factors such as climatic conditions (temperature and humidity) and population density (PD) (Dalziel et al. 2018; Casanova et al. 2010) . It was observed that the outbreak is more severe in the countries located in the mid-latitude where the temperature is considerably low in contrast to the tropical countries. Many researchers from different parts of the world tried to establish a relationship between COVID-19 transmission and various meteorological factors (Bashir et al. 2020; Prata et al. 2020; Shi et al. 2020) . In a study conducted in New York, USA, using Kendall and Spearman rank correlation test, it was found that mean temperature, minimum temperature and air quality had a significant association with the COVID-19 pandemic (Bashir et al. 2020) . Shi et al. (2020) reported a significant correlation between daily temperature and daily count of COVID-19 cases in China and suggested, temperature above 8-10 °C would lead to the declination of infected cases. Prata et al. (2020) concluded that a rise in 1 °C temperature would result in a decrease in the number of daily confirmed COVID-19 cases in Brazil. In India, so far, no comprehensive study regarding the climatic influences on COVID-19 has been reported. Therefore, in this study, we investigated the correlations among climatic and topographic factors with the state-wise total number of infected cases. The main goal is to examine scientific evidence about the spread of COVID-19 cases in India based on regional factors, including PD, climatic conditions and topography. In this study, we made an attempt to correlate different climatic and topographic variables with the number of COVID-19 infections in different states of India. We retrieved data regarding the number of COVID-19 cases in all the states of India as of April 27th, 2020 from https ://www.covid 19ind ia.org/. PD data were acquired from census India website (https ://www.censu s2011 .co.in). Due to limitation of the daily ground-monitored weather data in India, we obtained long-term annual climatic data [viz. temperature, rainfall, actual evapotranspiration (AET), wind speed (WS), solar radiation (SR), and specific humidity (SH)] from TerraClimate and Worldclim websites (http://www.clima tolog ylab. org/terra clima te.html). Shuttle radar topographic mission (SRTM) digital elevation model of 90-m spatial resolution was obtained from CGIAR website (http://srtm.csi.cgiar .org/). The first part of our research was intended to understand the relative climatic conditions of different states. Hence, we implemented De Martone aridity-humidity index (De Martonne 1925) . Although, this methodology is more appropriate in a smaller area (Baltas 2007) , however, due to its easier calculation and fair generalization, the approach was implemented for regional classification (Ahmadi et al. 2020 ). Moreover, due to the easier availability of temperature and rainfall data, this method has wider popularity (Zareiee 2014). The computation of the aridity index was done by the following equation: where I DM denotes the aridity index, P is the annual mean precipitation in mm, and T is the annual mean air temperature in °C. Initially, the Pearson product moment correlation was implemented to the number of infected cases along with all the input variables to find out their inter-correlations. Later, bivariate linear regression was done to determine any existence of significance between the topo-climatic factors with the COVID-19 transmissions. Partial least square regression (PLS) is a common method that reduces the predictor variables to a smaller set of uncorrelated components. Instead of original data, it runs least square regression on the reduced number of components. In general, PLS is very useful in collinear predictor variables. However, PLS is having a component called VIP that determines the relative importance of each factor (Akarachantachote et al. 2014). For easier computation of relative importance, we applied PLS in our topo-climatic data to construct a model and determine the relative importance of the variables. The VIP score of variable (j) can be calculated using the following equation: where W aj denotes weight of the jth factor in component a and R 2 (y,t a ) indicates fraction of variance in y explained by the component a. Detailed methodology of PLS and VIP can be found in the study of Wold et al. (1993) and Akarachantachote et al. (2014) . Very recently, GAM is extensively used in numerous studies and found useful to correlate COVID-19 cases with various (1) local meteorological parameters Qi et al. 2020; Prata et al. 2020; Wu et al. 2020 ). In the present study, log-linear GAM was applied to analyze the state-specific associations between infected counts and regional climatic factors, topography and PD. First, the basic model was built for total infected case as the outcome of all other input parameters. Then, parameters were log-transformed as well as smooth spline function was incorporated particularly to PD and E, because for only these two variables, standard deviation exceeded mean due to extremely high heterogeneity at the regional level. Thus, the equation can be expressed as follows: This approach also helped to explore linear and nonlinear effects of various parameters to health outcomes in terms of COVID-19 infections. Total 29,487 confirmed cases of infections were reported till April 27, 2020 across India. Maharashtra was registered the highest number of confirmed cases (8590); while only 9 among 36 provinces (comprising 28 states and 8 union territories in India) individually registered more than 1000 such cases. PD in India varies from 17 to 11,320 considering all the states and union territories (Table 1 ). While considering different climatic variables, due to the broad latitudinal differences among states, high variability in values was observed (Fig. 1) . The annual mean temperature varies from -5 °C (Ladakh) to 28 °C (Puducherry) ( Table 1) ; while, highest annual mean rainfall is observed in Meghalaya (3914 mm) and the lowest in Ladakh (164 mm). SH ranges from 0.002 to 0.015 kg kg −1 . Among all these states, range of AET is observed to be very high (10.75-100.99 mm). (3) Monthly mean WS above 10 m from the surface varies from 0.99 to 2.76 m s −1 . The SR varies between 15,236 and 20,301 kj m −2 day −1 . Average elevation varies from 15 to 4661 m above mean sea level. Based on De Martonne classification (Table 2) , we found six different climatic zones in entire India (e.g., semi-arid, moderate, semi-wet, wet, very wet and extremely wet) (Fig. 2) . According to this climatic classification, we establish five provinces under semi-arid, two under moderate, three under semi-wet, five under wet, seven under very wet and thirteen under extremely wet categories. The spatial distribution of COVID-19 cases in India indicates that maximum transmissions occurred within the states that fall under semi-arid and wet categories. However, provinces under wet (7) and extremely wet (13) categories are likely less infected by such transmission (Fig. 2) . To understand the influence of different climatic and topographic factors, we performed bivariate correlation using the long-term climatic data and topographic elevation. Table 3 shows the Pearson correlation coefficients between each variable. We selected the number of infections as dependent variable and all the geographical parameters were correlated as independent variables. We observed a significant positive correlation between temperature and rainfall with SH and AET. A strong correlation between temperature with SR (+) and elevation (−) was also noticed (Table 3) . Moreover, we found a significant positive relationship between the numbers of infections with SR (Fig. 3) . Although no such significant correlation was found between number of infections and other variables, a notable positive relationship with temperature and negative relationship with rainfall were observed. Similarly, SH, AET, and altitudinal variation have a negative relation with number of infections; while WS shows a positive relation (Fig. 3) . Surprisingly, we found Figure 3i illustrates the VIP of each variable. A large value (> 1) of VIP was recorded in case of SR, rainfall, temperature and AET. Elevation, winds speed, PD, and SH were found having a VIP number lower than 1. Using the GAM model, an attempt was made to relate the number of infected cases with all the geographical variables, which are taken into consideration for this study. Initially, we found no significant relation (R 2 = 0.219) using simple linear GAM. However, log-transformed values of all variables significantly improved the performance of the model (R 2 = 0.782). Using the flexibility of spline smoothening function to log-transformed value of PD (Fig. 4a ) and E (Fig. 4b) , the R 2 value mounted on 0.895 (Fig. 4c) . The parametric coefficients and approximate significance of smooth terms are enlisted in Table S4 (see supplemen- tary files), which shows that all input parameters were able to fit in the model in such a way that all coefficients were found statistically significant at 0.05 significance level (p values < 0.05). Log-transformed data with smoothening function to E and PD customized the model's prediction accuracy at a very significant level. Thus, it well explored the complex non-linearity in the relation of COVID-19 infections with geographical distribution. Although simple bivariate correlation does not produce any significant relevance, the sublime outcome through GAM model suggests multiple complex parameters to take into account for further investigation in any spatial context. GAM experiment depicts that NI is negatively associated with SH and R, while positively associated with SR and T. The recent COVID-19 has caused significant health encumbrance in many places around the world . In this paper, we investigated the spatial relationship among long-term climate, topography and social factors with the counts of confirmed COVID-19 cases in India. A substantial amount of studies in different places around the world has already tried to examine if there are any correlations between COVID-19 outbreak and the existing weather or climatic conditions (Bashir et al. 2020; Sajadi et al. 2020) . The prevailing meteorology (temperature, humidity, WS, etc.) significantly alters the environmental stability, therefore, it might affect the sustainability of viruses and the transmission process (Tosepu et al. 2020 ). According to Chen et al. (2020) , COVID-19 transmission is significantly affected by surrounding air temperature and humidity conditions, agreed by Shi et al. (2020) , on the occasion of major outbreak in mainland of China. In this study, we found a positive correlation between the number of infections with long-term climatic records of temperature, WS, SR (significant) and PD. In China, Shi et al. (2020) reported a negative correlation between temperature and COVID-19 transmission on the basis of the daily weather report. However, Ma et al. (2020) reported a positive association with mortality rate and daily temperature in Wuhan, China. Subsequently, considering the global context, transmission is found higher in particular regions of subtropical countries where the surrounding air temperature is significantly low (Poole 2020) . The significant correlation between SR and COVID-19 infection in India clearly indicates that high insolation during daytime does not prevent COVID-19 transmission. However, sunlight has the ability to boost the immune system and slow down the growth of infections in human body (Cannell et al. 2006; Miller 2018; Asyary and Veruswati 2020) . Asyary and Veruswati (2020) investigated the role of sunlight in COVID-19 outbreak and recovery. These workers did not observe any noticeable trend of sunlight exposure with the transmission rate, but reported a significant recovery rate under sunlight exposure. Our study indicates a negative association between rainfall, SH, AET and elevation. A time-series study from China indicated a negative correlation between daily relative humidity and COVID-19 transmission (Qi et al. 2020) . Moreover, a large number of previous epidemiological investigations reported a negative association between humidity and corona virus alike diseases (Zhang Qiang et al. 2004; Gardner et al. 2019) . Thus, the findings of present research on Indian context were agreed. We did not find any literature that is correlating the regional elevation with the COVID-19 transmission. Hence, we subsumed the average elevation for each province, since it significantly controls the climatic conditions. Our study indicated the regions in low lying elevation in India are more likely to get infected by higher COVID-19 transmission. Occurrence of infection counts in various climatic regions suggests that the transmission rate is likely inferior in the provinces under very wet and extremely wet categories of climatic conditions, and thus, significantly stipulate lower rate of transmission in wet condition. Moreover, accounting 29.2% of total cases in India, Maharashtra has been already In the present study, we found significant outcome of predicting infected cases through GAM model accounting several geographical parameters altogether. From GAM model, we understood that hot and dry areas are more likely to be infected by COVID-19 transmission. Higher WS at microscale may induce the ventilation, but, our study suggests, it will not have suitable impact over regional scale. Residual plots of smooth terms (i.e., PD and E) indicate that population statistics or regional topography may not have any accountability solely; however, these are important with a combination with meteorology. Same as any scientific investigations, our study has a significant amount of limitations: (1) we have presented only long-term climatic records to indicate the association between COVID-19 cases and prevailing circumstances. There is indeed a requirement of investigation using realtime daily weather data in different states. (2) As the disease is caused by a virus, there are lots of other factors that might be considered such as population migration, immunity power, age groups, hygiene systems, etc. Despite having limitations, this study is highly significant as it is the first report that is investigating the association of climate and COVID-19 transmission in the Indian context. This is simply a basic analysis and a large amount of data (district wise) might be incorporated for a stronger conclusion. The present study aimed to understand the geographical influence on spatial distribution of COVID-19 transmission at regional level in the context of India. It is observed by several statistical analyses that climatic factors have an unavoidable influence on this viral disease in India. The heterogeneity in the spatial occurrence of infections might be attributed to local meteorology with its geographical location and population. However, no single attribute individually can well explain the nature of transmission. Positive association with SR and temperature as well as negative association with humidity and rainfall suggests that hot and arid areas in low altitude regions are required to strictly follow-up preventive measures on an emergency basis. Investigation of effective climatology parameters on COVID-19 outbreak in Iran Cutoff threshold of variable importance in projection for variable selection Sunlight exposure increased COVID-19 recovery rates: a study in the central pandemic area of Indonesia Spatial distribution of climatic indices in northern Greece Correlation between climate indicators and COVID-19 pandemic Epidemic influenza and vitamin D Effects of air temperature and relative humidity on coronavirus survival on surfaces Roles of meteorological conditions in COVID-19 transmission on a worldwide scale Urbanization and humidity shape the intensity of influenza epidemics in U.S cities A case-crossover analysis of the impact of weather on primary cases of Middle East respiratory syndrome Severe acute respiratory syndromerelated coronavirus-the species and its viruses, a statement of the Coronavirus Study Group First case of 2019 novel coronavirus in the United States Role of air distribution in SARS transmission during the largest nosocomial outbreak in Hong Kong Effects of temperature variation and humidity on the death of COVID-19 in Wuhan Immune system: your best defense against viruses and bacteria from the common cold to the SARS virus Another decade, another coronavirus Seasonal influences on the spread of SARS-CoV-2 (COVID19), causality, and forecastabililty (3-15-2020) Temperature significantly changes COVID-19 transmission in (sub) tropical cities of Brazil COVID-19 transmission in mainland chaina is associated with temperature and humidity: a time-series analysis Temperature and latitude analysis to predict potential spread and seasonality for COVID-19 Impact of temperature on the dynamics of the COVID-19 outbreak in China A review of coronavirus disease-2019 (COVID-19) Correlation between weather and Covid-19 pandemic in Jakarta PLS-partial least squares projections to latent structures A new coronavirus associated with human respiratory disease in China Evaluation of changes in different climates of Iran, using De Martonne index and Mann-Kendall trend test Meteorological characteristics and their impacts during the SARS epidemic period Acknowledgements S.D. and S.B. wish to thank the Department of Geography, Savitribai Phule Pune University, for providing necessary facilities to carry out this study. All authors are thankful to covid19india.org, census India, CGIAR, TerraClimate and Worldclim websites for providing required data used in this study.