key: cord-0921800-7rx5irmi authors: Velásquez, Ricardo Manuel Arias; Lara, Jennifer Vanessa Mejía title: Gaussian approach for probability and correlation between the number of COVID-19 cases and the air pollution in Lima date: 2020-07-03 journal: Urban Clim DOI: 10.1016/j.uclim.2020.100664 sha: 73576635362767f5b8c3228287c93b0b16264af0 doc_id: 921800 cord_uid: 7rx5irmi At the end of February 2020, Peru started the first cases of pneumonia associated with coronavirus (COVID-19), they were reported in Lima, Peru (Rodriguez-Morales et al., 2020). Therefore, the first week on March started with 72 infected people, the government published new law for a national crisis by COVID-19 pandemic (Vizcarra et al., 2020), with a quarantine in each city of Peru. Our analysis has considered March and April 2020, for air quality measurement and infections in Lima, the data collected on 6 meteorological stations with CO (carbon monoxide), NO(2) (nitrogen oxide), O(3) (ozone), SO(2) (sulfur dioxide), PM(10) and PM(2.5) (particle matter with diameter aerodynamic less than 2.5 and 10 m respectively). As a result, the average of these concentrations and the hospital information is recollected per hour. This analysis is executed during the quarantine an important correlation is discovered in the zone with highest infection by COVID-19, NO(2) and PM(10), even though in a reduction of air pollution in Lima. In this paper, we proposed a classification model by Reduced-Space Gaussian Process Regression for air pollution and infections; with technological and environmental dynamics and global change associated COVID-19. An evaluation of zones in Lima city, results have demonstrated influence of industrial influence in air pollution and infections by COVID-19 before and after quarantine during the last 28 days since the first infection in Peru; the problems relating to data management were validated with a successful classification and cluster analysis for future works in COVID-19 influence by environmental conditions. China and Italy shut down transportation in every single way on January to March 2020 and established numerous, just in China 28 quarantines was implemented (Wilder-Smith and Freedman, 2020) . In these countries, air pollution and mortality were underestimated using the aforementioned models, they considered 60 days with a decrease in NO2 air pollution in China, in the last month predictions should indicate a 6% reduction in mortality due to air pollution (Dutheil F. et al. 2020) . Fine particles with PM2.5 and NO2 is produced by burning fossil fuels as heavy industry, vehicles and boilers. Quarantine was implemented in several countries, as Peru (Vizcarra M. et al., 2020) . Countries as Italy has cities with worst air quality as Lombardy and Emilia Romagna; then, these cities have more infections rate,so, they have increased rapidly their , it is associated about 18% of deaths worldwide, it described 4.9 million deaths are due to ambient air particle matter, NO2 and SOx (Landrigan P. et al. 2018 ). On the other hand, it describes a clear evidence of the "air pollution negative health effects such as chronic obstructive pulmonary disease (COPD), cardiovascular disease, and lung cancer that are widely reported in the literature" (Arias ). In Lima city, premature deaths has dramatic results of 2,300 deaths from PM and about 3,000 due to the use of stoves burning biomass indoors, in industrial environment (Arias Velásquez R. 2019). In Lima, traffic has been analyzed in a temporal and spatial perspective with the eight different air quality monitoring station in this city. "These stations have detected correlation between PM2.5 with some meteorological parameters such as temperature, relative humidity, wind direction, and wind speed was carried out at a seasonal level that can be used for future works associated to pollution analysis and temporal spatial analysis" (Romero Y., et al. 2020 ). The complete development was carried out in Lima, with six reference air quality stations; in this case, data collected by SENAMHI (see Fig. 2 ). The stations were Carabayllo, Campo de Marte, San Juan de Lurigancho, Santa Anita, San Borja, Villa del Triunfo. Besides, Lima owns an J o u r n a l P r e -p r o o f area of 2.672 km2 and a population over 12 million (as of 2019). As April 9 2020, this SARS-COV-2 has caused 5,256 infections with 138 deaths in Peru, in a especial way in Lima 3,704 people infected (over 71%). Although the COVID-19 deaths may be affected by many factors. This study allows to explore the effect of meteorological parameters on COVID-19 deaths using Reduced-Space Gaussian Process Regression for Data-Driven Probabilistic on air pollution and infections. The rest of this paper is organized as follows: Section 2, describe background and Reduced-Space Gaussian Process Regression for Data-Driven Probabilistic for COVID-19 and air pollution correlation. Section 3 develops the results the correlation and classification function, reports the main findings. Section 4, concludes. Contamination and cities with air pollution are one of the critical aspects for "prolonged inflammation, eventually leading to an innate immune system hyper-activation. In a small cohort of mice exposed for three months to particulate matter" 2.5 mm and 10 mm in diameter called PM2.5 y PM10 (Conticini E. et al. 2020) in COVID-19 and SARS (Yang J. et al. 2019) . . Cities with high concentrations of "PM2.5 and PM10 lead to systemic inflammation with an over expression of PDGF, VEGF, TNFa, IL-1 and IL-6 even in healthy", non-smoker and young subjects have a weak immune system and have a greater chance of contracting respiratory diseases. With this last research is important to identify zones with high pollution levels because, those zones are more vulnerable for COVID-19. (Radiology)NOT(radiation), with 26 research articles available at April 9 th 2020, the main contribution is described, as follows:  Emission reductions for quarantines during the COVID-19 in China has been investigated, by "transportation decreases PM2.5concentrations. The decreases of PM2.5 in Beijing, Shanghai, Guangzhou, and Wuhan were 9.23, 6.37, 5.35, and 30.79g/m3,respectively" (Wang P. et al. 2020).  On pregnancy stage, "respiratory rate remains unchanged in healthy pregnancy, and the finding of tachypnea is a significant finding and should prompt practitioners to further evaluate the patient, however new events in main cities have increased COVID-19 infections" (Juusela A. et al. 2020) .  . Special consideration should be given on pregnant stage and immunocompromised associated respiratory infections (Chavez S. et al. 2020) .  Air samples are investigated now, the first step is to collect samples from hospital and around patients, who have "close contact to the patients must adhere to national or international evidence based precautions" (Fadiri S. et al. 2020). This research develops a Gaussian Process Regression called GPR (Rasmussen C. et al. 2005) , with dynamical systems. In this case, GPR is used with probabilistic regression framework, with a training data set with Eq. (1), of N pairs of vectors with a input xn  R and "noisy scalar output yn" (Rasmussen C. et al. 2005) . For air pollution, yon should create a model generalized to the distribution of the output at unseen input location. At the same time, noise in output models represents observation error; so, Gaussian distribution, generates input-output relationship in Eq. (2). Where: "Gaussian process is a set of random values, they must be indexed by some" x  X as a subset. With Bayes theorem, it is possible to "make inferences on function values to unseen inputs conveniently using a finite number of training data" (Berry T. et al. 2015) . For this process, we considered mean function m(x) and a covariance function k(x,x') (Arias Velásquez R.M. et al, 2019), as Eq. (3) and Eq. (4). Where: Where: 1  : Is a hyper-parameter with maximum covariance. In Eq.(9), Bayes rule is written with a normalized process to find ( , ) ff  in Eq.(10). In Eq. (11), associates to "conditioning the joint Gaussian prior distribution on the observations, resulting in the closed-form Gaussian distribution" (Zhong Y. et al. 2016 ). With Eq. (12), f  , the mean and covariance should be directly added to obtain Eq. (13). Finally, in Eq.(14) makes feasible to use up to more than twelve thousands of training data set to make classifiers and forecasting. For cross validation a neural network (NN) is composed of nodes set and synapses, it needs a signal or input of a data set, in this research a "performs computation by propagating the signal along the connections", until, it influences the reply in the inner layer. It is compared to "biological neuron's spiking action, a nonlinear activation function is applied to nodes in any hidden layer and the output layer" (Cholled F., 2015). Method has been considered with the fit using Keras, for CNN a nonlinear function is considered in the Eq. (15). In the Eq. (16), individually node in the hidden layer has "linear combination" (Zhang W. et al., 2019) with a chain events. The hidden layers are composed as following:  Convolutional layers: For abstract local features at different locations.  Pooling layers: It uses the average value from each subarea of previous layer.  Fully-connected layers: It has a function similar to regular neural network. This type is powerful for "seizing local geometric features, spatial patterns and detects larger-scale features in deeper layers" (Lecun Y. et al., 1998) . (1) In Eq. (18) describes a matrix of weights, it is analogous to regression coefficient in the Eq. (19). (1) M bR  Where: h: it is a description of the hidden layer, due to restriction of the "linearly transformed and passed to the output layer". (2) Where: y: It is the maximum output vector; it has described the classifier as to the class of x. k: Class Finally, the Neural network allows to identify the correct classifier for COVID-19 infections and air pollution parameter for the forecast, after the Gaussian process. This framework in the Fig. 3 allows to obtain knowledge from the geographic information systems for COVID-19 and air pollution. The feedback from ANN allows to detect a correct forecast in future sets and to verify the accuracy day by day. In the Fig. 4 , the infections are 3,704 at April 9 th 2020, Lima has been selected for this case study. There are 7 zones aggregated for each meteorological station in Fig. 2 As we can see in Fig. 4 , Jesus Maria district is the most affected with COVID-19 (Zone 4), it is an industrial zone with high pollution. In the GPR analysis, the classifier has detected four clusters, in the early days of March 2020, before quarantine, the infections caused by week respiratory system (Predictive infections) and infections detected in Lima city (green colour). This predictive infection has a common factor: Hypertension, heart disease, breathing trouble or diabetes. Besides, cluster 2 demonstrated the quarantine effects, an important change in the infection curve, it is from March 16 th to 30 th , and cluster 3 is from March 31 th to April 9 th , as Fig. 5 . Finally, the industrial zone represent the highest air pollution zone during quarantine, this is worst near to meteorological stations with highest values of NO2, PM10 and PM2.5. With PM10 analysis, Fig. 10 has demonstrated the sensibility with quarantine and without quarantine. We can determinate the influence has more evidence with NO2 than PM10 as Fig. 7 . The Villa Maria del Triunfo and Carabayllo have decreased the particle matter content in the air. An interesting correlation is generated between NO2, PM10 and infections on Fig. 9 , high values of NO2 has more infections level in Lima city, in industrial zones, although PM10 has decreased, the life cycle of PM10 is shortest than NO2. In Lima city, the air pollution is an important factor. Besides, infections by COVID-19 in zones with the same population density has a high correlation with NO2 with and without quarantine restrictions. Quarantine has decreased the values of PM2.5 and PM10 for their short life cycle, therefore, the particle matter is not appropiated for this analysis. Lima is the biggest city with air pollution problems in Latin America. Our findings are a high correlation with NO2 and infections of COVID-19, it should be investigated in more countries, however, it is an important task in zones with industrial facilities, furthermore, we suggest in the Fig. 10 a correlation between NO2 and infections and PM10 has been described with R: 98.827% and 95.38%; it has demonstrated the J o u r n a l P r e -p r o o f sensibility with quarantine and without quarantine. We can determinate the influence has more evidence with NO2 than PM10 as Fig. 7 . With Gaussian approach, is possible to create a forecast of infections if we considers the historical data of NO2 and COVID-19 infections. This research is based on classifier model by Reduced-Space Gaussian Process Regression for air pollution and infections. The key finding in this study is the effect of air pollution on mortality has dramatically increased associated to NO2 with quarantine actions of the goverment. New politics should evaluate improvements in this zones, and more research should corroborate this first step. Empirically, we show people with immediate access to emergency healthcare are less susceptible to air pollution in comparison to those without and, definitely, with lower values than zone 4. With this novel methodology develops for evaluation of infections by COVID-19 and air quality: CO, NO2, O3, SO2, PM10 and PM2.5. As a result, the mean NO2, our findings suggest that better industrial zones with higher than NO2 26 g/m3 can increase infections by COVID-19. Finally, the GPR is a comprehensive air quality methodology for infections analysis to respiratory risk. Our analyses identify which between which pollutants are more health-damaging and tell little about how the interactions of different air pollutants can affect human health. Further studies are needed to investigate these issues. Citizen science approach for spatiotemporal modelling of air pollution quality Insulation failure caused by special pollution around industrial environments Forecast and evaluation of COVID-19 spreading in USA with reduced-space Gaussian process regression Converting data into knowledge for preventing failures in power transformers Life estimation of shunt power reactors considering a failure core heating by floating potentials Nonparametric forecasting of low-dimensional dynamical systems Pattern recognition and image preprocessing COVID-19): A primer for emergency physicians Can atmospheric pollution be considered a co-factor in extremely high level of SARS-CoV-2 lethality in Northern Italy? COVID-19 as a factor influencing air pollution? press Adel Mokamel, Mohammad Sadegh Hassanvand, Talat MokhtariAzad, A field indoor air measurement of SARS-CoV-2 in the patient rooms of the largest hospital in Iran Two Cases of COVID-19 Related Cardiomyopathy in Pregnancy The Lancet Commission on pollution and health Gradient-based learning applied to document recognition Gaussian processes in machine learning Should the Holy Week 2020 be cancelled in Latin America due to the COVID-19 pandemic? Travel Medicine and Infectious DiseaseIn press, corrected proof Temporal and spatial analysis of traffic -Related pollutant under the influence of the seasonality and meteorological variables over an urban city in Peru An overview of deep learning in medical imaging focusing on MRI Isolation, quarantine, social distancing and community containment: pivotal role for old-style public health measures in the novel coronavirus (2019-nCoV) outbreak WHO (World Health Organization), n.d. How air pollution is destroying our health The influence of PM2.5 on lung injury and cytokines in mice Removal of impulse noise in color images based on convolutional neural network Reduced-Space Gaussian Process Regression for Data-Driven Probabilistic Forecast of Chaotic Dynamical Systems Authors would like to thank to Universidad Nacional de San Agustín de Arequipa, for the knowledge contribution in this research. There is no conflict of interest in this work.