key: cord-234737-trshrh6f authors: Notari, Alessio title: Temperature dependence of COVID-19 transmission date: 2020-03-27 journal: nan DOI: nan sha: doc_id: 234737 cord_uid: trshrh6f The recent coronavirus pandemic follows in its early stages an almost exponential growth, with the number of cases quite well fit in time by $N(t)propto e^{alpha t}$, in many countries. We analyze the rate $alpha$ for each country, starting from a threshold of 30 total cases and using the next 12 days, capturing thus the early growth homogeneously. We look for a link between $alpha$ and the average temperature $T$ of each country, in the month of the epidemic growth. We analyze a {it base} set of 42 countries, which developed the epidemic earlier, an {it intermediate} set of 88 countries and an {it extended} set of 125 countries, which developed the epidemic more recently. Applying a linear fit $alpha(T)$, we find increasing evidence for a decreasing $alpha$ as a function of $T$, at $99.66%$C.L., $99.86%$C.L. and $99.99995 %$ C.L. ($p$-value $5 cdot 10^{-7}$, or 5$sigma$ detection) in the {it base}, {it intermediate} and {it extended} dataset, respectively. The doubling time is expected to increase by $40%sim 50%$, going from $5^circ$ C to $25^circ$ C. In the {it base} set, going beyond a linear model, a peak at $(7.7pm 3.6)^circ C$ seems to be present, but its evidence disappears for the larger datasets. We also analyzed a possible bias: poor countries, often located in warm regions, might have less intense testing. By excluding countries below a given GDP per capita, we find that our conclusions are only slightly affected and only for the {it extended} dataset. The significance remains high, with a $p$-value of $10^{-3}-10^{-4}$ or less. Our findings give hope that, for northern hemisphere countries, the growth rate should significantly decrease as a result of both warmer weather and lockdown policies. In general the propagation should be hopefully stopped by strong lockdown, testing and tracking policies, before the arrival of the cold season. The recent coronavirus (COVID-19) pandemic is having a major effect in many countries, which needs to be faced with the highest degree of scrutiny. An important piece of information is whether the growth rate of the confirmed cases among the population could decrease with increasing temperature. Experimental research on related viruses found indeed a decrease at high temperature and humidity [1] . We try to address this question using available epidemiological data. A similar analysis for the data from January 20 to February 4, 2020, among 403 different Chinese cities, was performed in [2] and similar studies were recently performed in [3] [4] [5] [6] [7] . The paper is organized as follows. In section II we explain our methods, in section III we show the results of our analysis and in section IV we draw our conclusions. We start our analysis from the empirical observation that the data for the coronavirus disease in many different countries follow a common pattern: once the number of confirmed cases reaches order 10 there is a very rapid subsequent growth, which is well fit by an exponential behavior. The latter is typically a good approximation for the following couple of weeks and, after this stage of free propagation, the exponential growth typically gradually slows down, probably due to other effects, such as: lockdown policies from governments, a higher degree of awareness in the population or the tracking and isolation of the positive cases. Our aim is to see whether the temperature of the environment has an effect, and for this purpose we choose to analyze the first stage of free propagation in a selected sample of countries. We choose our sample using the following rules: • we start analyzing data from the first day in which the number of cases in a given country reaches a reference number N i , which we choose to be N i = 30 [8] ; • we include only countries with at least 12 days of data, after this starting point. The data were collected from [9]. We then fit the data for each country with a simple exponential curve N (t) = N 0 e αt , with 2 parameters, N 0 and α; here t is in units of days. In the fit we used Poissonian errors, given by √ N , on the daily counting of cases. We associated then to each country an average temperature T , for the relevant weeks, which we took from [10]. More precisely: if for a given country the average T is tabulated only for its capital city, we directly used such a value. If, instead, more cities are present for a given country, we used an average of the temperatures of the main cities, weighted by their population [11] . For most countries we used the average temperature for the month of March, with a few exceptions [12] . We analyzed three datasets. A first list of countries was selected on March 26th. Finally an extended set has been studied on April 14th [13] , adding the following countries to the previous dataset: Belarus, Bolivia, Cameroon, Congo, Cote d'Ivoire, Cuba, Democratic Republic of Congo, Djibouti, El Salvador, Georgia, Ghana, Guatemala, Guinea, Honduras, Jamaica, Kenya, Kosovo, Kyrgyzstan, Madagascar, Mali, Mauritius, Montenegro, Niger, Nigeria, Paraguay, Puerto Rico, Rwanda, Sri Lanka, Togo, Trinidad and Tobago , Uganda, Uzbekistan, Venezuela, Zambia. Using such datasets for α and T for each country, we fit with two functions α(T ), as explained in the next section. Note that the statistical errors on the α parameters, considering Poissonian errors on the daily counting of cases, are typically much smaller than the spread of the values of α among the various countries. This is due to systematic effects, which are dominant, as we will discuss later on. For this reason we disregarded statistical errors on α. The analysis was done using the software Mathematica, from Wolfram Research, Inc.. We first fit the base dataset, with a simple linear function α(T ) = α 0 + β T , to look for an overall decreasing behavior. Results for the best fit, together with our data points, are shown in fig. 1 . The estimate, standard deviation, confidence intervals for the parameters, together with the significance and the explained variance, R 2 , are shown in Table I . From such results a clear decreasing trend is visible, and indeed the slope β is negative, at 99.66% C.L. (p-value 0.0034). However, the linear fit is able to explain only a small part of the variance of the data, with R 2 = 0.196, and its adjusted value R 2 adjusted = 0.175, clearly due to the presence of many more factors. In addition, a decreasing trend is also visible in this dataset, below about 10 • C. For this reason we also fit with a quadratic function Results for the quadratic best Table II . From such results a peak is visible at around T M ≈ 8 • C. The quadratic model is able to explain a slightly larger part of the variance of the data, since R 2 ≈ 0.27 [14] . Moreover, despite the presence of an extra parameter, one may quantify the improvement of the fit, using for instance the Akaike Information Criterion (AIC) for model comparison, ∆AIC ≡ 2∆k − 2∆ ln(L), where ∆k is the increase in the number of parameters, compared to the simple linear model, and ∆ ln(L) is the change in the maximum log-likelihood between the two models. This gives ∆AIC = −2.1, slightly in favor of the quadratic model. We Table II: In the left panel: best-estimate, standard deviation (σ) and 95% C.L. intervals for the parameters of the quadratic interpolation, for the base set of 42 countries. In the right panel: R 2 for the best-estimate and p−value of a non-zero β. in fig. 3 and in Table III . The slope β is smaller in absolute value, but the significance actually slightly increases, since a zero slope is excluded at 99.86% C.L. (p-value 0.0014). Now R 2 = 0.11 and R 2 adjusted = 0.10. In this sample the quadratic trend is not visible anymore, and indeed the AIC does not prefer the quadratic fit: ∆AIC = +0.9 compared to the linear fit, in disfavor of the quadratic model. The R 2 is also practically the same as in the linear fit. For the extended sample results of the linear fit are shown in fig. 4 and in Table IV . The slope β becomes larger and, most importantly, the significance highly increases, since a zero slope is now excluded at 99.99995% C.L. (p-value 5·10 −7 , or 5σ detection, translated in the language of a Gaussian distribution). Now R 2 = 0.19 and R 2 adjusted = 0.18. In this dataset, which extends to April 14th, a few anomalies are however present: in the case of Bangladesh and Thailand it is possible to see that the exponential growth became much faster after the initial 12 days. We have checked what happens by using a different interval of time for these 2 cases, instead of the standard 12 days. Namely we have used 44 days for Thailand and 21 days for Bangladesh, which give the maximal value of α in both cases. The results for the linear fits using such corrected values is shown in Table V . The significance is lower, but still very high: p-value 4.6 · 10 −6 , or 4.6σ detection, translated in the language of a Gaussian distribution. Finally we have tested the existence of a possible bias on the data: the fact that poor countries V: In the left panel: best-estimate, standard deviation (σ) and 95% C.L. intervals for the parameters of the linear interpolation, for the extended set of 125 countries. Here Thailand and Bangladesh have been corrected for, as explained in the text. In the right panel: R 2 for the best-estimate and p−value of a non-zero β. have less intense testing. This could in principle be a source of major bias, since many countries with low income are located in warm regions. In order to discard such a bias we have analyzed the existence of a nonzero linear correlation β on subsamples of the extended dataset, by excluding countries with low income. More specifically we have set a threshold on the GDP per capita [15] , and checked whether the correlation is still there, excluding countries below such a threshold from the analysis. We show in Fig. 5 our results: we find a correlation to exist, rather independently on the threshold that we applied. The significance of a nonzero beta (p-value) is plotted in Fig. 6 and remains always between 5 · 10 −7 and 8 · 10 −4 . In addition, we have also checked for a correlation between the growth rate α and the GDP per capita, shortly GDP . We find no significant correlation in the base and intermediate datasets, while we find a negative correlation in the extended dataset, with p-value = 0.0012. This is not so surprising, since the extended dataset contains many low-income countries, where the disease has arrived later, and where most likely testing is not intense enough. For this dataset we performed thus a linear fit with two variables, GDP and T . Results are shown in Table VI . The dependence on T is still highly significant, with p-value 0.000048 and the best-estimate is β −0.0031. As expected, T also has non-negligible correlation with the GDP per capita. Table VI: In the top panel: best-estimate, standard error (σ), t−statistic and p−value for the parameters of the linear interpolation in two-variables, temperature (T) and GDP per capita (GDP ), for the extended set of 125 countries. In the bottom panel: R 2 and correlation coefficient (i.e. normalized off-diagonal element of the covariance matrix) between T and GDP . We have collected data for countries that had at least 12 days of data after a starting point, which we fixed to be at the threshold of 30 confirmed cases. We considered three datates: a base dataset with 42 countries, collected on March 26th, an intermediate dataset with a total of 88 countries, collected on April 1st, and an extended dataset with a total of 125 countries, collected on April 14th. We have fit the data for each country with an exponential and extracted the exponents α, for each country. Then we have analyzed such exponents as a function of the temperature T , using the average temperature for the month of March (or slightly earlier in some cases), for each of the selected countries. For the base dataset we have shown that the growth rate of the transmission of the COVID-19 has a decreasing trend, as a function of T , at 99.66% C.L. (p-value 0.0034). In this fit R 2 = 0.196. In addition, using a quadratic fit, we have shown that a peak of maximal transmission seems to be present in this dataset at around (7.7 ± 3.6) • C. Such findings are in good agreement with a similar study, performed for Chinese cities [2] , which also finds the existence of an analogous peak and an overall decreasing trend. Other similar recent studies [3] [4] [5] [6] find results which seem to be also in qualitative agreement. For the intermediate dataset we also found a decreasing slope β. This is smaller in absolute value, but the significance remains high, since a zero slope is excluded at 99.86% C.L. (p-value 0.0014). For this fit we found R 2 = 0.11. Finally for the extended dataset we found a very highly significance for a negative β, p-value 5 · 10 −6 ∼ 5 · 10 −7 (depending on the treatment of some anomalous cases), which would translate in a 4.5σ ∼ 5σ detection, in the language of Gaussian distributions. Here R 2 = 0.16 ∼ 0.2. For all datasets we also tested the influence of a possible large bias: the fact that poorer countries have less intense testing, which might be in principle partially degenerate with effects of temperature. Our analysis indicate that this should not be a major issue: by excluding countries with low income from the analysis we find small variations on the best-fit value of β, and the significance of the correlation β remains very high, with p-value 8 · 10 −4 or less. We have also checked for a correlation between the GDP per capita and α: we find a significant correlation only in the extended dataset. This should be probably interpreted as the fact that poorer countries do not have enough testing capabilities. However, after taking into account of this variable, the dependence on T remains highly significant. The decrease at high temperatures is expected, since the same happens also for other coronaviruses [1] . It is unclear instead how to interpret the decrease at low temperature (less than 8 • C), present in the base dataset. This could be a statistical fluctuation, since it is not present in the intermediate and extended datasets. One possible reason for this decrease, if real, could be the lower degree of interaction among people in countries with very low temperatures, which could slow down the propagation of the virus. A general observation is also that a large scatter in the residual data is present, clearly due to many other systematic factors, such as variations in the methods and resources used for collecting data and variations in the amount of social interactions, due to cultural reasons. Further study is required to assess the existence and the relevance of such factors. As a final remark, our findings can be very useful for policy makers, since they support the expectation that with growing temperatures the coronavirus crisis should become milder in the coming few months, for countries in the Northern Hemisphere. As an example the estimated doubling time, with the quadratic fit, at the peak temperature of 7.7 • C is of 2.6 days, while at 26 • C is expected to go to about 4.6 days. The linear fit implies an increase in the doubling time by 50% (or 40%), going from 5 • C to 25 • C., using the estimate from the extended dataset (or the extended dataset, taking into account of the GDP per capita, at a reference value of 40 thousand dollars). For countries with seasonal variations in the Southern Hemisphere, instead, this should give motivation to implement strong lockdown policies before the arrival of the cold season. We stress that, in general, it is important to fully stop the propagation, using strong lockdown, testing and tracking policies, taking also advantage of the warmer season, and before the arrival of the next cold season. The Effects of Temperature and Relative Humidity on the Viability of the SARS Coronavirus Temperature significant change COVID-19 Transmission in 429 cities Spread of SARS-CoV-2 Coronavirus likely to be constrained by climate Will Coronavirus Pandemic Diminish by Summer? High Temperature and High Humidity Reduce the Transmission of Temperature, Humidity and Latitude Analysis to Predict Potential Spread and Seasonality for COVID-19 The role of absolute humidity on transmission rates of the COVID-19 outbreak the one in which the number of cases Ni is closest to 30. In some countries, such a number Ni is repeated for several days; in such cases we choose the last of such days as the starting point. For the particular case of China For Japan we have subdivided into three regions: Hokkaido, Okinawa and the rest of the country, using respectively the temperatures of Sapporo, Naha and Tokio. For the U.S.A. we used the national average of about 5 Japan we considered an interpolating function of the temperature for the months of January, February and March and we took an average of such function in the relevant 12 days of the epidemic Only countries with at least 300.000 inhabitants have been considered in this dataset where SSR is the residual sum of squares and SST is the sum of the squared differences between the α values and their mean value We would like to acknowledge Viviana Acquaviva, Alberto Belloni, Ángel J. Gómez Peláez, Jordi Miralda and Giorgio Torrieri, for useful discussions and comments.