key: cord-0943781-0n74gzif authors: Karim, Md. Rezaul; Akter, Mst. Bithi; Haque, Sejuti; Akter, Nazmin title: Do Temperature and Humidity Affect the Transmission of SARS-CoV-2?-A Flexible Regression Analysis date: 2021-07-25 journal: Ann DOI: 10.1007/s40745-021-00351-y sha: 92dd49082a850895b8cb6159954dc5282738e472 doc_id: 943781 cord_uid: 0n74gzif Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible virus that causes Coronavirus disease 2019 (COVID-19). Temperature and humidity are two essential factors in the transmission of SARS-CoV-2 affect the respiratory system of human. This study aimed to investigate the effects of temperature and humidity on the transmission of SARS-CoV-2 and the Spread Covid-19. The daily number of SARS-CoV-2 infected new cases, and the number of death due to Covid-19 are considered the response variables. Data are collected from March 08, 2020 to January 31, 2021. A flexible regression model under the Generalized Additive Models for Location Scale and Shape framework is used to analyze data. The temperature and humidity have a significant impact on the transmission of SARS-CoV-2. The temperature is highly significant in the number of SARS-CoV-2 infected new cases and number of death due to COVID-19. In contrast, the humidity is significant on the number of SARS-CoV-2 infected new cases, but it is insignificant on the number of death due to COVID-19 at a 5% level of significance. The analysis revealed that both the temperature and humidity inversely affected the daily number of deaths and new cases of COVID-19. Coronavirus disease 2019 , caused by the novel coronavirus (official name is SARS-CoV-2; formerly called 2019-nCoV), has become a major public health problem all over the world [1] . In light of the rising danger, World Health Organization (WHO) declared COVID-19 as an international public health emergency [2] . Although it is still unknown exactly where the outbreak first started, many early cases of COVID-19 have been attributed to people who have visited the Huanan Seafood Wholesale Market, located in Wuhan, Hubei, China [3] . Globally, as of March 21, 2021 there have been 123.55 million confirmed cases of COVID-19, including 2.72 million deaths and among confirmed cases 99.53 million are recovered [4] . Bangladesh is a well-known climate-vulnerable country due to its high population density and complex meteorological settings [5] . In Bangladesh, the first coronavirus cases were confirmed on March 08, 2020 by the country's epidemiology institute, the Institute of Epidemiology Disease Control and Research (IEDCR). It has been reported that the temperature, humidity, wind, and precipitation may favour either the spread or the inhibition of epidemic episodes [6, 7] reported that the transmission of viruses is influenced by weather conditions and the density of people. Although Bangladesh is an over-populated country (about 160 million), COVID-19 in Bangladesh seems less acute. As of March 21, 2021 there have been 570,878 confirmed cases including 8690 deaths and among confirmed cases 522,105 are recovered [4] . The reason for moderate transmission of COVID-19 might be an influence of tropical weather (consisting of high temperature, often excessive humidity). Meteorological parameters are the important factors influencing infectious diseases such as severe acute respiratory syndrome (SARS) and influenza [8] . It is supposed that high temperature and humidity, together, have a combined effect on the inactivation of coronaviruses. In contrast, the opposite weather condition can support the prolonged survival time of the virus on surfaces and facilitate the transmission and susceptibility of the viral agent [9] . There is also some evidence that COVID-19 cases have particularly clustered around cooler, drier regions [10, 11] . Many articles have been published to examine the effects of temperature and humidity on the spread of COVID-19. A systematic review article has also been published in [12] . Most of the researches findings are that there is a significant effect of temperature and humidity on the spread of COVID-19. However, there is still a lack of evidence because some studies found no association between COVID-19 transmission with temperature (see for example, [13, 14] ). In addition, we know that the viruses continuously mutate, and SARS-CoV-2 also change similarly. Callaway [15] state that SARS-CoV-2 has been mutating at a rate of about 1-2 mutations per month. Mutations can have a negative or positive impact on the SARS-CoV-2 virus's capability to sustain and replicate, depending on where in the SARS-CoV-2 the genome misconstructions transpire. The researcher cautioned that these mutant genealogies of the SARS-CoV-2 strain would be continued uncontrolled transmission of SARS-CoV-2 in many parts of the world. Viral mutations and variants in the United States are regularly scanned through sequence-based surveillance, laboratory studies, and epidemiological investigations [16] . Recently, a novel SARS-CoV-2 mutated (known as lineage B.1.1.7) emerged in the United Kingdom (UK) in November 2020 and expanded quickly in other countries [17] . A total of 17 mutations have been recorded in the new strain found in the UK. Virologists in Bangladesh have announced that a new SARS-CoV-2 strain is a bit similar to the one discovered in the United Kingdom recently [18] . After the mutation, we do not know the effects of the temperature and humidity on the transmission of SARS-CoV-2 strain. Hence, it is crucial to understand the behaviour of the transmission of SARS-CoV-2 for the current data. Therefore, the main objective of this research is to investigate the effects of temperature and humidity on the transmission of SARS-CoV-2 by using flexible regression models. We try to understand the seasonal behaviour of the transmission of SARS-CoV-2 and the spread of COVID-19. A detailed material and methods regarding data source and statistical models are explained in Sect. 2. Section 3 describe the data analysis and results. Finally, the discussions and conclusions are portrayed in Sect. 4. Data of Covid-19 cases are collected from the daily reports of the Institute of Epidemiology Disease Control and Research (IEDCR), Dhaka, Bangladesh, during the period of March 08, 2020 to January 31, 2021. Data are available on the website with the link https:// en. wikip edia. org/ wiki/ COVID-19_ pande mic_ in_ Bangl adesh. The daily temperature (measured in • C ) and humidity (%) of Bangladesh are collected from the website https:// www. timea nddate. com/ weath er/ bangl adesh/ dhaka. Generalized Linear Models (GLM) and Generalized Additive Models (GAM) respectively introduced by [19, 20] , are very popular in statistical data analysis. Rigby and Stasinopoulos [21] proposed a generalized additive model for location, scale and shape (GAMLSS) as a way of overcoming some of the limitations associated with GLM and GAM models for regression analysis. It is a general framework of (semi)parametric regression models where the distribution of response variable does not necessarily belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. We consider the logarithmic transformation of the number of SARS-CoV-2 infected new cases and the number of death due to COVID-19 as response variables of the GAMLSS model. In the sequel we denote, for notational convenience, "number of SARS-CoV-2 infected new cases" as "number of new cases". To avoid the logarithmic transformation of zero and the computational complexity under the GAMLSS modelling framework, we add 1.1 to each response variable before the logarithmic transformation. For each response variable, we fit the GAMLSS model separately. The probability distribution of each response variable (Y) under the GAMLSS modelling framework is chosen based on the minimum Bayesian information criterion (BIC) and Akaike information criterion (AIC) values. The Normal Exponential-t distribution is selected for Y = log(number of new cases) , and the Gumbel distribution is selected for Y = log(number of death) . A detailed selecting procedure is described in Sect. 3.2. The Normal Exponential-t Distribution (NET) distribution was first introduced by [22] as a robust method of fitting the mean and scale parameters of symmetric distribution as functions of explanatory variables (X). The probability density function (pdf) of NET distribution, which is denoted as NET( , , , ), is given by [22] and defined by and c 3 = 2 ( −1) exp − + 2 2 , where (⋅) is the cumulative distribution function of the standard normal variate. Note that the location parameter is the mean of Y, for detailed density can be found in [23] . We are interested in estimating the mean function in the regression settings. The pdf of the Gumbel distribution (also called an extreme value or Gompertz distribution), denoted by GU( , ), is defined by: ≈ 0.577 is Euler-Mascheroni constant and Var(Y) = 2 2 ∕6 , for detailed density can be found in [23] . The covariates for both response variables are time (in days), temperature, and humidity are considered for this article. The beauty of the GAMLSS model is that the systematic part of it can be elaborated to endorse modelling not only the location (usually, mean) but other parameters of the distribution such as scale, shape. These parameters could be linear parametric and/or additive non-parametric functions of covariates and/or random effects. In this research, we choose flexible predictor models via fractional polynomial and B-spline functions for finding the smoothing function of the predictor time. To estimate the conditional mean of the response variable Y given covariate X = (time, temperature, humidity) , we have to estimate the parameters (as a function of X) of the conditional distribution of Y given X. Therefore, the flexible regression models for the location function (X) and the scale function (X) under the flexible GAMLSS modeling framework can be written as and The (penalized) maximum likelihood estimation is used to estimate the parameters of the model (1) and (2). The fractional polynomial in flexible predictor models is a generalization of the polynomial function. The general form of a fractional polynomial in x of degree m can be written as where m is an integer and with p 0 = 0 and H 0 (x) = 1 , for a sequence of powers p 1 ≤ p 2 ≤ ⋯ ≤ p m from the grid The optimal combination of powers will be selected by using the smallest value of BIC. We select p 1 = 0 , p 2 = 0 , p 3 = 0.5 and m = 3 for the response variable of log(number of new cases) and hence the model (3) can be written as For the response variable of log(number of death), we select p 1 = 1 , p 2 = 2 and p 3 = 2 , and the fractional polynomial in time (in days) variable of degree m = 3 for the model (3) can be written as (4) f p (time; 1 ;0, 0, 0.5) = 10 + 11 log(time) + 12 [log(time)] 2 + 13 (time) 0.5 . (5) f p (time; 1 ; 1, 2, 2) = 10 + 11 × time + 12 × (time) 2 + 13 × (time) 2 × log(time). Flexible smoothing function with basis spline (B-spline) models were also fitted in order to get a more flexible approximation to the data. A general form of B-spline predictor model of x for the degree D can be written as where K is the number of knot values, b k is the knot value at kth interval or piece and H(x > b k ) is the Heaviside function taking value 1 if x > b k , otherwise 0. The combination of D, K, and the number knots values will be chosen based on the lowest value of BIC. To explore the raw data and find an indication for selecting the more sophisticated statistical model, we provide descriptive statistics and some graphical presentation of the variables in this section. Table 1 summarizes the descriptive statistics of the daily number of death due to COVID-19, SARS-CoV-2 infected new cases, and meteorological variables such as temperature and humidity for n = 324 days. This study included 8033 total death, and 535,139 confirmed cases during that period. The average of the daily number of death due to COVID-19 and number of SARS-CoV-2 infected new cases are 24.79 and 1626.2, respectively. Besides, other factors showed that the lowest temperature of 20 • C with the highest temperature of 37 • C , and the lowest humidity of 21% with the highest humidity of 100%. The histogram with kernel density plot of the number of death due to COVID-19 and the number of SARS-CoV-2 new cases are presented in Fig. 1 . Figure (a) shows that the distributional shape of the number of death due to COVID-19 seems symmetric, indicating that the bell-shape distribution would be one of the best probability models for this variable. In contrast, Figure (b) reveals that the distributional shape of the number of SARS-CoV-2 infected new cases looks similar to a skewed pattern, indicating a skewed distribution would be more suitable for predicting this variable's values. The scatter plot of the number of death due to COVID-19 and the number of SARS-CoV-2 infected new cases against time index for the period from August 03, 2020 to January 31, 2021 are drawn in Fig. 2 . We clearly see a nonlinear relationship between the response variables and the time index. We depict the scatter plot of the number of death due to COVID-19, and the number of SARS-CoV-2 infected new cases against humidity in Fig. 3 . The relationship between the number of death due to COVID-19 and the number of SARS-CoV-2 infected new cases against temperature are shown in Fig. 4 . It is observed from these figures that there is a connection between both response variables and temperature and humidity covariates. Without adjusting time effect in the model, we consider the following regression model to explore only the conditional relationship between two response variables Y = log(number of new cases) and Y = log(number of death) and two covariates named temperature and humidity. The mean regression model is, for i = 1, 2, … , n where y i = log(number of new cases) (and log(number of death)) and i is the disturbance term for ith individual. Under the classical regression model assumptions (see for example, [24] ), the summary statistics of the model (7) are tabulated in Table 2 . The exploratory results show that the temperature is not significant on the number of new cases and on the number of death. In contrast, the humidity is highly significant on both response variables. Next, we consider the time (in days) variable as a covariate in the model. Since the exploratory data analysis shows a nonlinear relationship between time and response variables, we need advanced computer-intensive statistical models for further research. For selecting the best probability model for the response variable Y = log(number of new cases) , the summary including AIC and BIC values with their degrees of freedom of all selected candidate distributions coming from the GAMLSS family, are provided in Table 7 in "Appendix". Above all of the distributions, we selected five possible candidate distributions based on the minimum BIC provided in Table 9 in "Appendix". It is noticed that the smallest BIC and AIC are observed for the NET model. In contrast, the highest value of BIC (and also AIC) is observed for the Skew-t type-3 model. Based on the minimum BIC, we select the NET model to explain the transmission of SARS-CoV-2 for further investigation. Similarly, for the response variable Y = log(number of death) , the summary results including AIC and BIC values with degrees of freedom of all selected candidate distributions coming from the GAMLSS family, are provided in Table 8 in "Appendix". Above all of the distributions, we selected five possible candidate distributions based on minimum BIC presented in Table 10 in "Appendix". It is noted that the smallest values of BIC and AIC are observed for the Gumbel model. In contrast, the highest value of BIC (and also AIC) is observed for the Skew-t type-4 model. Therefore, the Gumbel model is chosen as the best model to describe the number of death due to COVID-19 for further analysis. corresponding estimated fractional polynomial model for the (X) in time (in days) of degree 3 is The summary statistics of this estimated flexible predictor model (8) is tabulated in Table 3 . Hence, the estimated flexible regression model for mean function E(Y|X) = (X) of the conditional NET distribution under the GAMLSS modeling framework is Note that the values of two fixed parameters is 1.5 and is 2 in the GAMLSS modelling framework. We found the Global Deviance is 288.373, AIC is 308.373, and SBC is 346.181 for the final fitted model. Table 3 shows that the temperature and humidity are highly significant on the number of SARS-CoV-2 infected new cases. In addition, the regression coefficients for both temperature and humidity are negative which indicates that there is a negative relationship between these variables and the Table 4 . Hence, the estimated flexible regression model of (10) can be written as Table 4 shows that the temperature and humidity are highly significant on the number of death due to COVID-19. For the response variable log(number of new cases), we also use B-spline function given in (6) The summary statistics of the estimated function ̂ (X,̂ ) and log ̂ (X,̂ ) are presented in Table 5 . For this estimated model, the Global Deviance, AIC and SBC are −46.572 , −12.572 and 51.700, respectively. In the estimated mean function Ê (Y|X) =̂ (X i ;̂ ) , we see the slope co-efficient of temperature ( 1 ) and humidity ( 2 ) are negative which indicates that there is a negative relationship between these variables. In addition, the regression co-efficients for both temperature and humidity are highly significant on the number of SARS CoV-2 infected new cases. Similarly for estimated log ̂ (X,̂ ) , we see the slope co-efficient of temperature ( 1 ) and humidity ( 2 ) are also negative which indicates that there is a negative relationship between these variables where both regression coefficients are not significant on the number of SARS-CoV-2 infected new cases at 5 % level of significance. Table 5 The summary statistics of flexible regression models of (X; ) and log( (X; ) via B-spline smoothing function for the response variable log(number of new cases) To estimate (X; ) of the Gumbel distribution, we select D = 3 and K = 0 in the model given in (6) . The estimated flexible function of f b (time i ; 0 , 3, 0) for ith individual is Using the estimated B-spline function given in (14) , the estimated function of (X; ) can be written as By using the estimated B-spline model given in (15), the estimated scale function ̂ (X i ;̂ ) for i = 1, 2, … , n is The summary statistics of the estimated models are tabulated in Table 6 . In the estimated mean function ̂ (X i ;̂ ) , we see the slope co-efficient of temperature ( 1 ) and humidity ( 2 ) are positive which indicates that there is a positive relationship between these variables. Table 6 shows that, the temperature is highly significant but the humidity is not significant on the number of death due to COVID-19 at 5% level of significance. Similarly for estimated log ̂ (X,̂ ) , we see the slope co-efficient of temperature ( 1 ) and humidity ( 2 ) are negative which indicates that there is a negative relationship between these variables and the number of death due to (15) Table 6 The summary statistics of flexible regression models of (X; ) and log( (X; ) via B-spline smoothing function for the response variable log(number of death) We also calculate the predicted values of response variable via fractional polynomial and B-spline models. The graphical presentation of actual values and predicted values are depicted in Fig. 5 . We see the estimated curve via B-spline function is a smooth curve which is expected. On the other hand, the estimated curve via fractional polynomial function is not smooth. However, estimated both curves are very close. This study examined whether the temperature and humidity in the transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) affect humans' respiratory system that causes Coronavirus disease . We relied on the daily count of the number of confirmed SARS-CoV-2 infected new cases and the total number of death due to COVID-19 per day from Institute of Epidemiology Disease Control and Research (IEDCR), Dhaka, Bangladesh. A generalised additive model location scale and shape (GAMLSS) model is used to examine the effect of temperature and humidity on the number of confirmed SARS-CoV-2 infected daily new cases and the total number of death due to COVID-19 separately. Without adjusting the time effect in exploratory data analysis, we did not find the significant impact of temperature on both response variables. To investigate the significant effects of temperature and humidity after adjusting the time variable, we used the flexible GAMLSS model. The best response distribution is chosen based on the minimum BIC under the GAMLSS modeling framework. The Normal Exponential-t distribution for log(number of new cases) and Gumbel distribution for log(number of death) are selected. To estimate the systematic part of the GAMLSS model, we have employed two flexible predictor models such as (i) fractional polynomial model and (ii) B-spline smoothing model. Both models suggested that high temperature and high humidity significantly reduce the transmission of SARS-CoV-2. A fractional polynomial model indicates that high temperature and high humidity significantly reduce the number of deaths due to COVID-19. Many researches support these results (see, for example in [12] ) but these are opposite of the findings of [25] . According to the fitted fractional polynomial model, for every 1 • C increase in temperature, the number of deaths due to COVID-19 reduced by 8.9% (95% CI: 7.3%, 10.5%) and daily new cases reduced by 6.2% (95% CI: 4.6%, 7.8%); for every 1% increase in humidity, the number of deaths due to COVID-19 reduced by 1.5% (95% CI: 1.2%, 1.8%) and daily new cases reduced by 0.8% (95% CI: 0.48%, 1.1%), holding all the other factors constant. On the other hand, the B-spline model suggested that high temperature and high humidity minimise the number of death due to COVID-19, where the temperature significantly affects. However, the humidity significantly affects the number of deaths at a 10% level of significance but not significantly affects at a 5% level of significant. Note that there are a number of reasons for getting the insignificant effect of the humidity in the B-Spline model. It might happen that the sample size ( n = 324 ) is not enough to find the significant humidity effect in the B-spline model. Moreover, the temperature and humidity are correlated. As the response variable is already well explained by the temperature and B-spline function of the time variable, it is possible to get a high p value of the regression coefficient of humidity. According to the fitted B-spline model, for every 1 • C increase in temperature, the daily number of deaths due to COVID-19 reduced by 0.8% (95% CI: 7.9%, 9.5%) and the daily new cases reduced by 2.2% (95% CI: 0.03%, 4.4%); for every 1% increase in humidity the number of deaths due to COVID-19 reduced by 0.4% (95% CI: 1.2%, 1.9%) and daily new cases reduced by 0.3% (95% CI: 0.02%, 0.62%), holding all the other factors constant. Although our analysis shows that the temperature and humidity will be affected by the transmission of SARS-CoV-2, we notice that the temperature and humidity alone do not explain most of the variability of the transmission of SARS-CoV-2 infection. To find the actual behaviour and variability of transmission of SARS-CoV-2 infection, we have to consider the temperature and humidity with other confounding factors such as population density, public health policies, public health intervention, social isolation campaigns, actual diagnosis, transportation system, people lifestyle, etc. in the computer-intensive statistical model. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia World Health Organization declares global emergency: a review of the 2019 novel coronavirus (COVID-19) Covid-19: epidemiology, evolution, and cross-disciplinary perspectives COVID-19 coronavirus pandemic Are precipitation concentration and intensity changing in Bangladesh overtimes? Analysis of the possible causes of changes in precipitation systems Observed and potential impacts of the COVID-19 pandemic on the environment Correlation between climate indicators and COVID-19 pandemic Effects of temperature variation and humidity on the death of COVID-19 in Wuhan The effects of temperature and relative humidity on the viability of the SARS coronavirus High temperature and high humidity reduce the transmission of COVID-19 An initial investigation of the association between the SARS outbreak and weather: with the view of the environmental temperature and its variation Effects of temperature and humidity on the spread of COVID-19: a systematic review No association of COVID-19 transmission with temperature or UV radiation in Chinese cities Association between ambient temperature and COVID-19 infection in 122 cities from China The coronavirus is mutating-does it matter? SARS-CoV-2 variants and ending the COVID-19 pandemic. The Lancet Washburne AD et al (2021) Estimated transmissibility and impact of SARS-CoV-2 lineage B. 1.1. 7 in England. Science No need to p]anic over new strain-experts say; local virologists found the variant in early November, almost similar to the UK one Generalized linear models Generalized additive models Generalized additive models for location, scale and shape Robust fitting of an additive model for variance heterogeneity Instructions on how to use the gamlss package in R second edition Effect of ambient temperature on COVID-19 infection rate Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations See the Tables 7, 8, 9 and 10. Funding This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. The data sources are provided the Section 2. We will provide data and R code if anyone needs these. We have conducted ourselves with integrity, fidelity, and honesty. We have not intentionally engaged in or participated in any form of malicious harm to another person or animal. The authors declare that they have no conflict of interest.