key: cord-1030240-0ypcp96g authors: Copiello, Sergio; Grillenzoni, Carlo title: The spread of 2019-nCoV in China was primarily driven by population density. Comment on “Association between short-term exposure to air pollution and COVID-19 infection: Evidence from China” by Zhu et al. date: 2020-07-16 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2020.141028 sha: aedb6028d43920a0dc8454e2620b8af246a8ccad doc_id: 1030240 cord_uid: 0ypcp96g Abstract Recently, an article published in the journal Science of the Total Environment and authored by Zhu et al. has claimed the “Association between short-term exposure to air pollution and COVID-19 infection” (doi: https://doi.org/10.1016/j.scitotenv.2020.138704). This note shows that the stated dependence between the diffusion of the infection and air pollution may be the result of spurious correlation due to the omission of a common factor, namely, population density. To this end, the relationship between demographic, socio-economic, and environmental conditions and the spread of the novel coronavirus in China is analyzed with spatial regression models on variables deflated by population size. The infection rate - as measured by the number of cases per 100 thousand inhabitants - is found to be strongly related to the population density. At the same time, the association with air pollution is detected with a negative sign, which is difficult to interpret. The outbreak of the coronavirus pandemic (Cheng and Shan, 2020) has stimulated a multitude of studies on the topic over just a few months. Yet a few of them go beyond the clinical scope and try to deal with other epidemiological aspects. However, both the earlier literature on other viruses and some recent studies on the novel coronavirus have examined the likely relationship between socio-economic and environmental conditions and the diffusion of pandemics. In particular, those studies point to the role potentially played by weather conditions (Iqbal et al., 2020; Sobral et al., 2020) , transportation (Adda, 2016; Jia et al., 2020) , economic activity (Sarmadi et al., 2020) , and air pollution (Coccia, 2020; Conticini et al., 2020) . As far as the latter aspect is concerned, it is worth noting that pollution emissions are already known to be associated with respiratory viral J o u r n a l P r e -p r o o f 3 infections (Becker and Soukup, 1999; Ciencewicki and Jaspers, 2007; Cui et al., 2003; Horne et al., 2018; Mehta et al., 2013; Xu et al., 2016; Ye et al., 2016) . Recently, an article published in the journal Science of the Total Environment has supported the "Association between short-term exposure to air pollution and COVID-19 infection" based on an analysis of 120 Chinese cities using a generalized additive model. The authors find "significantly positive associations of PM 2.5 , PM 10 , CO, NO 2 and O 3 with COVID-19 confirmed cases" (p. 3). In this note, we show that the results of the study mentioned above may be affected by the issue of spurious correlation due to the omission of a common factor, namely, population density. Far be it from us deny that air pollution -and other factors as wellmay have amplified the spread of the pandemic. The issue lies in the fact that, except for weather conditions, the concurrent factors suggested so far -i.e., transportation volumes, economic activity, and air pollution -are anthropic in nature. Thus, they all depend on the extent of human activities: the larger the population is, the higher the transportation volumes, economic activity, air pollution, and virus infections are ( Fig. 1 ). Accordingly, when it comes to measuring the actual effect of anthropogenic causes on the pandemic, normalizing by population size and controlling for population density (Coccia, 2020) is by no means an option. Under the above framework, we consider alternative modeling, whose data covers almost all Chinese provinces and their socio- As far as the dependent variables are concerned, let us denote by Cov the total number of confirmed cases of 2019-nCoV , and by RCov the incidence rate of the infection, namely, the number of cases per 100 thousand inhabitants. The first set of covariates is as follows:  Pop is the population size;  Den is the population density, namely, the ratio between population and area in km 2 ;  Grp is the gross regional product;  Pr stands for the yearly average precipitation; This study focuses on 28 mainland Chinese provinces, autonomous regions, and municipalities outside Hubei province. Tibet and Guizhou are excluded due to missing data. Data about pollution emissions are taken from the paper "The Pollution state in 31 Provinces and Regions in China" (Yang and Yang, 2011) . Although outdated, those values are assumed to be a fair proxy of current emissions in Chinese provinces. To study the relationships between dependent and independent variables, we use the spatial autoregressive (SAR) models (Copiello and Grillenzoni, 2017; Elhorst, 2010) with exogenous predictors: = + ̅̅̅̅̅ −1 + 0 + 1 + 2 + 3 + 4 ℎ + 5 2 + 6 + , ~IN(0, 2 ) (1) = + ̅̅̅̅̅̅̅ −1 + 1 + 2 + 3 + 4 ℎ + 5 2 + 6 + , where i is the province index, ,  j , and  are the coefficients, and e i and u i are residuals, which are expected to be independent and normally ( In the models of Eqs. (1)-(2), is the spatial autocorrelation coefficient; hence, ̅̅̅̅̅ −1 and ̅̅̅̅̅̅̅ −1 are spatially lagged dependent variables (i.e., the mean values of Cov ji and RCov ji in the j provinces contiguous to the ith area) These lagged terms aim to identify whether the analyzed phenomenon has a spatial pattern accordingly to Tobler's (1970) first law of geography, namely, that "everything is related to everything else, but near things are more related than distant things" (p. 236). In order to satisfy the assumption of homoscedasticity (i.e.,  2 independent of i), all variables are transformed with natural logarithms (ln). The core of the analysis is represented by statistical significance and sign (+ or -) of the estimated coefficients  j . In particular, the differences in the  j of in the models of Eqs. (1)-(2) is a symptom of spurious correlation between epidemic and environmental variables. The estimates of the models of Eqs. (1)-(2) are provided in Tables 1-2 (see also The coefficients of the spatially lagged terms are not significant, meaning the absence of spatial correlation in the data. However, that may depend on the high level of spatial aggregation of provincial data, which involves suitable local policies to control the epidemic. For example, movement restrictions should be better adopted at the national level, at least, provided the national borders are not porous. As regards the analysis of the explanatory variables, in the model of Cov (the absolute number of confirmed cases), the average maximum temperature Th plays a significant role. Apart from indirect effects -namely, the higher the temperature, the higher is the level of social interactions, and so the spread of the infection -the positive coefficient of Th stimulates other interpretations. Assuming that the new coronavirus was already circulating before December 2019, it could imply that the recent global outbreak is also related to the mild weather conditions experienced in February 2020 (Masters, 2020) . That contrasts with the expectation that the epidemic will spread less easily and more slowly during spring and summer as temperatures get warmer, as also suggested in other articles published in the journal Science of the Total Environment (Ma et al., 2020; Xie and Zhu, 2020) . However, it has to be considered that the incidence rate of the infection -adjusted for population density and other factors -has been found to be J o u r n a l P r e -p r o o f 11 inversely associated with warmer and drier weather conditions (Byass, 2020) . Another significant covariate of the overall confirmed cases is the gross regional product Grp, which takes on a positive sign. Incidentally, that predictor is significantly correlated with some of the variables representing air pollution (SO2:  0.5516, p-value 0.0023; Iwg:  0.5412, p-value 0.0029). Apparently, this finding confirms the association found in the study authored by Zhu et al. (2020) . Nevertheless, it is a trivial result. It has to be expected that the overall confirmed cases are higher in the most populated areas, which are usually also the most industrialized and wealthy, and, as a consequence, the most polluted ones. The problem is much more evident when turning to the analysis of the predictors of the incidence rate RCov. Population density is a significant driver of the number of cases per 100,000 population. That is in keeping with earlier literature (Amuakwa-Mensah et al., 2017) , as well as with recent studies focusing on how population size and population density affect both the current and future spread of COVID-19 disease (Jahangiri et al., 2020; Rocklöv and Sjödin, 2020; Zhang et al., 2020) . That might explain why, to date, the epidemic has hit so hard several highly densely populated areas around the world: Lombardy region in Italy, North Rhine-Westphalia in Germany, Madrid metropolitan area as far as Spain is concerned, New York in the United States, San Paulo in Brazil, and so forth. That is actually the issue with the finding presented by Zhu et al. (2020) , namely, that the authors missed normalizing by population the number of novel coronavirus cases before testing the relationship with air pollution and other covariates. The authors state that their generalized additive model also includes "city fixed effects … to control for J o u r n a l P r e -p r o o f 12 time-invariant city characteristics such as population size and density" (p. 2). Unfortunately, the results are not reported in full detail, so it is unknown the role played by those fixed effects, as well as how the same fixed effects interact with the variables measuring the pollutants. Hence, there remains an open question: would the COVIDpollution relationship be confirmed using the incidence rate, instead of the number of confirmed cases, as the dependent variable? Furthermore, in the model of Eq. (2), the level of emissions of industrial waste gas Iwg is another significant predictor of the number of cases per 100,000 population. Nevertheless, it takes on a negative sign, which leaves room for doubt about the hypothesis that air pollution has actually played a role in the spread of 2019-nCoV. Economic Activity and the Spread of Viral Diseases: Evidence from High Frequency Data Multiple regression: A primer Climate variability and infectious diseases nexus: Evidence from Sweden Exposure to urban air particulates alters the macrophage-mediated inflammatory response to respiratory viral infection Eco-epidemiological assessment of the COVID-19 epidemic in China Novel coronavirus: where we are and what we know Air Pollution and Respiratory Viral Infection Factors determining the diffusion of COVID-19 and suggested strategy to prevent future accelerated viral infectivity similar to COVID Can atmospheric pollution be considered a co-factor in extremely high level of SARS-CoV-2 lethality in Northern Italy? Is the cold the only reason why we heat our homes? Empirical evidence from spatial series data Air pollution and case fatality of SARS in the People's Republic of China: an ecologic study An interactive web-based dashboard to track COVID-19 in real time Applied Spatial Econometrics: Raising the Bar Short-Term Elevation of Fine Particulate Matter Air Pollution and Acute Lower Respiratory Infection The nexus between COVID-19, temperature and exchange rate in Wuhan city: New findings from partial and multiple wavelet coherence The sensitivity and specificity analyses of ambient temperature and population size on the transmission rate of the novel coronavirus (COVID-19) in different provinces of Iran Population flow drives spatio-temporal distribution of COVID-19 in China Effects of temperature variation and humidity on the death of COVID-19 in Wuhan Earth's 2nd Warmest February and 3rd Ambient particulate air pollution and acute lower respiratory infections: a systematic review and implications for estimating the global burden of disease China Statistical Yearbook High population densities catalyse the spread of COVID-19 Association of COVID-19 global distribution and environmental and demographic factors: An updated three-month study Association between climate variables and global transmission oF SARS-CoV-2 A Computer Movie Simulating Urban Growth in the Detroit Region Association between ambient temperature and COVID-19 infection in 122 cities from China Urban Areas in The Pollution state in 31 provinces and regions in China Haze is a risk factor contributing to the rapid spread of respiratory syncytial virus in children The Effect of Population Size for Pathogen Transmission on Prediction of COVID-19 Pandemic Spread 1-30 Association between short-term exposure to air pollution and COVID-19 infection: Evidence from China