key: cord-0909414-9fssqaea authors: Miranda, Amanda Carvalho; Santana, José Carlos Curvelo; Yamamura, Charles Lincoln Kenji; Rosa, Jorge Marcos; Tambourgi, Elias Basile; Ho, Linda Lee; Berssaneti, Fernando Tobal title: Application of neural network to simulate the behavior of hospitalizations and their costs under the effects of various polluting gases in the city of São Paulo date: 2021-10-29 journal: Air Qual Atmos Health DOI: 10.1007/s11869-021-01077-9 sha: 842cb2163473605a285ac37450607ffb79a03905 doc_id: 909414 cord_uid: 9fssqaea This work aims to obtain an artificial neural network to simulate hospitalizations for respiratory diseases influenced by pollutant gaseous such as CO, PM(10), PM(2.5), NO(2), O(3), and SO(2) emitted from 2011 to 2017, in the city of São Paulo. The hospitalization costs were also be calculated. MLP and RBF neural networks have been tested by varying the number of neurons in the hidden layer and the type of equation of the output function. The following pollutants and its concentration range were collected considering the supervision of Alto Tiete station set, in several neighborhoods in the city of São Paulo, from in the period 2011 to 2017: 28–63 µg/m(3) of PM(2.5), 52–110 µg/m(3) of PM(10), 49–135 µg/m(3) of O(3), 0.8–2.6 ppm CO, 41–98 µg/m(3) of NO(2), and 3–16 µg/m(3) of SO(2). Results showed that a RBF neural network with 6 input neurons, 13 hidden layer neurons, and 1 output neuron, using BFGS algorithm and a Gaussian function to neuronal activation, was the best fitted to the experimental datasets. So, knowing the monthly concentration of gaseous pollutions was possible to predict the hospitalization of 1464 to 3483 ± 510 patients, with costs between 570,447 and 1,357,151 ± 198,171 USD per month. This way, it is possible to use this neural network to predict the costs of hospitalizing patients for respiratory diseases and to contribute to the decision-making of how much the government should spend on health care. The effects of air pollution on people have been the subject of research around the world to correlate the photochemical effects of air and health, the respiratory system, and the aggravation of allergic diseases. According to Arbex et al. (2012) , the individuals most susceptible to diseases caused by pollutant emissions are children, the elderly, people with chronic diseases, and people with genetic susceptibility. In addition, pollutants can affect the human fetus during pregnancy by causing intrauterine growth retardation, prematurity, low birth weight, and-in the most severe cases-congenital anomalies and intrauterine or perinatal death. Researchers from different countries have been united in their quest to correlate the effects of air pollutants with respiratory diseases. In this way, they had identified an increase in the incidence of asthma in adults living in hightraffic Switzerland regions (Kunzli et al. 2010) , which was also associated the rise in PM 2.5 levels, in the Netherlands (Gehring et al. 2010) , and in addition, the combination of PM 10 and NO 2 emissions induced premature deaths due to cardiovascular and respiratory diseases, in China and Italy (De Marco et al. 2019; Yu et al. 2016) . When the PM 2.5 concentration increased 10 µg/m 3 , an increase in the lung cancer incidence in the USA (Reis et al. 2018 ) and an increment of 3.9 mm Hg in the average systolic 24-h blood pressure for hypertensive and/or diabetic workers in Brazil (Santos et al. 2019 ) were observed. The increased incidence of hospitalization of patients with chronic obstructive pulmonary disease (COPD) has been associated to the environmental pollutants such as SO 2 , NO 2 , PM 2.5 , and PM 10 , in Denmark, Hong Kong, and Korea (Andersen et al. 2011; Hu et al. 2010; Ko et al. 2007 ). In the Americas, more than 131,000 people in low-income countries and 96,000 people in highincome countries are estimated to die each year due to air pollution-related diseases. The elderly, children, and patients with pre-existing chronic diseases are the groups most at risk (WHO 2019). Thus, it was concluded that pollutant emissions negatively affected the quality of life of individuals, but other researches have also associated this to an economic setback for the public health system. Ravina et al. (2018) estimated an average cost of hospitalizations for respiratory diseases at 2.5 million euros per year in the city of Turin, Italy. Gao et al. (2018) reported that the world suffered economic losses of 129 billion USD in association with greenhouse gas (GHG) emissions in 2016. In Brazil, an average of 22,000 people is estimated to lose their lives prematurely each year because of exposure to pollutants outside the home, especially in the urban environment. According to the World Health Organization (WHO 2019), that could reach 36,000 people/year by 2040. In recent decades, studies on air pollution and its effects on human health in São Paulo have provided considerable evidence of an association between increased hospital admissions for illness in individuals exposed to atmospheric pollutants such as O 3 , NO 2 , SO 2 , CO, and inhalable PM (Andrade et al., 2017) . According to Bravo et al. (2016) , more than 99,000 deaths are attributed to air pollution every year. The annual cost of immobility in São Paulo is equivalent to 7.5% of the city's gross domestic product (GDP), which is a significant impact on residents' health (Santana et al. 2020) . According to Pinheiro et al. (2014) , Brazil heath system (SUS) presented a cost with air pollution cost more than 2 million USD from 1993 to 1995. The money was spent to treat patients who developed diseases directly related to excess pollutants. This amount of money could be used to perform 784,000 medical consultations or 10,100 normal births in SUS-affiliated hospitals. Mantovani et al. (2016) reported that in São José do Rio Preto, São Paulo, Brazil, the number of pollutant-related respiratory disease hospitalizations was of the order of 650 in 3 years, which represented an increase in SUS spending of approximately US$ 50,000. Paiva (2014) estimated that in Volta Redonda, Rio de Janeiro, Brazil, the annual cost of hospitalizations owing to pollutant exposure between 2005 and 2007 was US$ 44,000. This effect was also verified by Abe and Miraglia (2016) , who reported that the reduction in the levels of PM 10 , PM 2.5 , and O 3 could save lives and an impressive amount of money in a country where economic resources are scarce. Moreover, reduced levels of air pollution would also prevent the demand for hospital inpatient rooms, since there would be diminished hospitalizations. There is new knowledge that the gaseous pollutants affect the human health and impact the health systems; however, this assessment depends strongly on air pollution epidemiology and exposure science. Mathematical models to describe the dispersion and precipitation (below-cloud) of gaseous particles in the atmosphere under the influence of wind speed, temperature, pressure, and air humidity are very common (Duhanyan and Roustan 2011; Xu et al. 2019; Zhao et al. 2015) . However, there is no model that makes these direct links between costs and the number of inpatients due to gaseous pollutant emissions. However, the recent advances in researches have done the cross-linkage of atmospheric, epidemiology, and mathematic sciences, allowing analysts to quantify an increasing number of health outcomes in far greater detail than was previously possible (Ravina et al. 2018) . Yu et al. (2016) used remote sensing for observation to assessment in real time the spatial variability of air pollutants. This has an advantage over in situ data in terms of determining the impact of air pollution on human health because it has better spatial coverage and thus provides better understanding of the spatial distribution of the human heath impact. They used an aggregate risk index to account for the overall impact on human health of exposure to multiple air pollutants and to reflect the linear relationship between air pollution and health risks. Their results showed that Chinese areas with the highest risk due to air pollution are mainly located in the Taklimakan Desert, northern China, Sichuan Basin, and middle of Inner Mongolia. Miranda et al. (2017) have been developed a statistical model based in a pollutant emission dataset. They found that almost 10,000 deaths/year are associated with PM 2.5 ; the population segments most at risk are the elderly and children. In Italy, a model that associates costs with the treatment of diseases caused by air pollution has been developed. This tool is called of DIDEM model and was developed in Matlab® software. It associates the WHO standards with the California puff model (CALPUFF) and the dispersion and external model developed by the Polytechnic of Turin. The DIDEM model is initially powered by the simulations of pollutants performed by CALPUFF model to the concentration-exposure-response functions according to the WHO recommendations, so that the impacts and costs according with Italy reality are presented (Ravina et al. 2018) . Polezer et al. (2018) had developed an artificial neural network (ANN) to predict the influence of MP 2.5 on hospitalizations for respiratory diseases. They had been testing multilayer perceptron (MLP), extreme learning machines (ELM), and echo state networks (ESN), to assess the influence of PM 2.5 , temperature, and relative humidity on hospitalizations in Curitiba city from Brazil. They pointed that neural networks can contribute to epidemiological studies, making it possible to assess impacts even when traditional statistical regressions models do not fit well to the air pollution dataset. However, there is not register on applying of an ANN to simulation of hospitalization and its costs on influence of gaseous emissions. Thus, this work aims to obtain an artificial neural network to simulate hospitalizations for respiratory diseases influenced by CO, PM 10 , PM 2.5 , NO 2 , O 3 , and SO 2 gaseous emissions at the Alto Tiete station set, in the city of São Paulo, in the period from 2011 to 2017. The costs will also be calculated associated with hospitalizations. Multilayer perceptron (MLP) and radial bases function (RBF) neural networks will be tested by varying the number of neurons in the hidden layer and the type of equation of the output function. Data on the number of hospitalizations for respiratory diseases and their respective costs in hospital units for São Paulo were collected from the TABNET-SUS online platform with the following filters: SUS hospital morbidity, city selection, and time range. Thus, data could be collected on the number of hospitalizations and amount spent by the Brazilian government in public or private hospitals each month from 2011 until 2017. The costs and number of monthly hospitalizations that occurred due to respiratory diseases in São Paulo hospitals were quantified and correlated with pollutant emissions in the region during this period (Berssaneti et al. 2014; Saut et al. 2017; Santana et al. 2020) . The São Paulo State Environmental Company (CETESB 2019) is a government agency responsible for the control, supervision, monitoring, and licensing of pollution-generating activities in this area with the fundamental concern of preserving and restoring the water quality, air, and soil. Research was conducted on the CETESB (2019) website, under the link Qualar (air quality) (http:// ar. cetesb. sp. gov. br/ padro es-de-quali ty-do-ar/). Gaseous pollutants were collected at the Alto Tiete station set, in the city of São Paulo from 2011 to 2017, and its concentration (average monthly) had varied between 28 and 63 µg/m 3 of PM 2.5 , 52 and 110 µg/m 3 of PM 10 , 49 and 135 µg/m 3 of O 3 , 0.8 and 2.6 ppm CO, 41 and 98 µg/m 3 of NO 2 , and 3 and 16 µg/m 3 of SO 2 . CETESB has several sensors distributed in various parts of São Paulo that continuously monitor the air quality and send it to its central line. Thus, the pollutant emission ratio was collected for this period at the Alto Tiete station set. This central line corresponds to a set of pollutant gas measurement stations that are distributed among the cities of the metropolitan region of São Paulo. Alto Tiete station set has 27 stations, but only the data from the 15 stations corresponding to the neighborhoods of the city of São Paulo were selected in this work. CETESB used beta radiation to determine PM 2.5 and PM 10 , pulse fluorescence to determine SO 2 , chemiluminescence to determine NOx, non-expendable infrared to determine CO, and the ultraviolet method to determine O 3 . The air pollutants emitted monthly with the most impact on the population's health were monitored according to the relationship presented by Arbex et al. (2012) and Santana et al. (2020) , as cited by the WHO (2019) and CONAMA (2019). The rate of change of the dollar was used at the time of the last data collection (1 US $ = R $ 3.80), as there was no increase in the amounts paid by the Brazilian government to hospitals, so they are closest to the real values (SUS 2019). From 2011 to 2017, the average amount paid to hospitals for hospitalization for respiratory diseases was US$ 389.65 ± 74.04 per patients. Total hospitalization costs were calculated using Eq. 1 (Santana et al. 2020; Severo Jr et al. 2007a; Rosa et al. 2014 ). Since the standard deviation of the amount paid for hospitalization was 19%, this is high and will increase the error in cost forecasts. The variation in the price paid for hospitalization is due to the number of days the patient was hospitalized and medication consumption by him (Benvenga et al. 2016; Miranda et al. 2018; Severo Jr et al. 2007a ). In this work, we tested the radial bases function (RBF) and multilayer perceptron artificial (MLP) neural networks contained in the Software Statistica 10® data mining package in the automated neural network tab. In a RBF, each neuron in the hidden layer uses a nonlinear radial basis function as activation function. This layer performs a (1) E = 100 * (1 − Predict∕RealValue) nonlinear transformation of the input. In this sense, the output layer acts as a linear combiner that maps the nonlinearity to a new space. The bias of the output layer neurons can be modeled by an additional signal in the previous one, which has a constant activation function (Du and Swamy 2014) , according to Fig. 1 . An RBF used a radial bases function as a function activation of its neurons. Through a linear combination of radial base functions of the inputs and parameters of the neurons, the outputs are generated (Menita et al. 2012; Santana et al. 2010 ). The multilayer perceptron (MLP) is considered one of the most important architectures of ANN. This method presents a set of information processors units known as artificial neurons, which form the input layer, one or more hidden layers, and an output layer, as shown in Fig. 2 (Haykin et al. 2009 ). The information flows from the input layer, passes through the intermediate layers, and closes in the output layer, generating an output response. In a MLP, the disjoint layers are connected, whereas neurons of the same layer do not communicate. The training, or the adjustment of their weights, is performed in two phases: the first one is the forward propagation, in which the signals from a training set sample are inserted as inputs of the network, being propagated layer by layer; in the second phase, the errors are propagated in a recursive manner, and the weights are adjusted through some adjustment rule (Haykin et al. 2009; Siqueira et al. 2014) . The most commonly used method to tune the MLP is the steepest descent algorithm, with the derivatives calculated via the backpropagation method. An MLP consists of at least three layers of neuron, being the central neuron(s) called the hidden layer(s) and each node between the neurons has a nonlinear activation function, which are supervised by backpropagation learning technique Rosa et al. 2013; Santana et al. 2015; Severo Jr et al. 2007b ). The training of ANN was performed using data obtained from the CETESB (2019) and SUS (2019) data base. For this study, networks were selected running regression, which can be applied in simulation and prediction data. Two types of neural networks were tested: a radial basis network (RBF) and the multilayer perceptron (MLP). The RBF was configured with 6 input neurons (MP 10 , MP 2.5 , O 3 , CO, SO 2 , and NO 2 ) and 1 output (hospitalization), being used from 12 to 25 neurons in the hidden layer. In this case, the software configuration already selects the radial basis function Gaussian located as network activation function and linear function as the output function (Severo Jr et al. 2007b; Menita et al. 2012; Santana et al. 2010) . In this work, the BFGS algorithm (Broyden-Fletcher-Goldfarb-Shanno) was used. In addition, 1,000 seeds were used to control startup, with decay of the weights for the output and hidden layers from 0.0001 to 0.001 and the minimum to maximum (Menita et al. 2012; Santana et al. 2010) . The MLP was configured with the same 6 neurons in the input layer and 1 neuron in the output layer, ranging from 3 to 20 neurons in the hidden layer. Functions as logistic, hyperbolic tangent, sine, and exponential were tested for both activation and in the hidden layer and output neuron Rosa et al. 2013; Santana et al. 2015) . Common to all networks tested in this work, the sum of squared error (SSE) was used as error function. To the training of the networks, a random sample of 70% of the experimental data was used. To retraining or validation 15% of the experimental data was employed. Thousand training and 20 retraining were performed. The criterion for choosing the best neural network architecture was its multiple correlations (R 2 ) in the training, testing, and validation stages. The closer to 1.0 R 2 , the neural network was considered to be more adjusted to the experimental data. The two-tailed F test was also used to check for significant differences between the means of the real datasets and those predicted by the neural network. As fitting parameters, R 2 and mean errors were used. The error was calculated according to Eq. 2. The neural network that best fit the experimental data was RBF with 6 input neurons, 13 hidden layer neurons, and 1 output neuron, using BFGS algorithm and a Gaussian function to neuronal activation. The multiple correlations (R 2 ) of the training, testing, and validation steps were 0.7267, 0.7959, and 0.8557, respectively. Predictions should not be made from the overall stipulated intervals (minimum and maximum) for each input neuron. By using these ranges to input the values obtained from the outputs will be found between 1464 and 3483 ± 510 hospitalizations and their costs will be equivalent to 570,447 and 1,357,151 ± 198,171 USD per month, respectively. Figure 3 shows the dispersion graph of the actual data in relation to the predicted value. As noted, there is a high variation in the data, due to a non-causal behavior, which makes mathematical modeling difficult and indicates that the neural network is the most suitable for such random behaviors. Figure 4 shows the variation of real value of hospitalization with its predicted value. As noted, although most (real) experimental data are close to predicted values, there are several scattered data and these compromised the adjustment of the neural network. This was not opted to delete this data as it cannot be considered an outlier. For this way, the neural network was able to predict 85.57% of the experimental data. However, the value of the F test was low, there are no significant differences between the means of the datasets, and an error below 2% guarantees the accuracy of the predicted results. This degree of prediction can be considered very good, because the variations in emissions suffer from several uncontroversial influences, such as the number of vehicles circulating in the city of São Paulo, fuel consumption, fuel type, temperature effects, wind, and rainfall. Admissions, on the other hand, depend on the amount of emissions, the contact time with pollutants, people's resistance to disease, the number of people in risk groups exposed to pollutants, and others (Menita et al. 2012; Ravina et al. 2018 ). Ten simulations were made for gaseous emissions situations and to test the predictive power of the neural network according to Tables 1 and 2. For this simulation, the values of the monthly emission averages of each pollutant were used individually. In the real field, the actual number of hospitalizations listed in the DATA-SUS was presented and the predictive was estimated by the neural network. The percentage error of the network forecasts ranged from − 2 to 25% of the total monthly hospitalizations. These errors are considered high, but satisfactory, because in the environmental and (2) E = 100 * (1 − Predict∕RealValue) social fields, 75% accuracy is considered very good (Menita et al. 2012; Ravina et al. 2018; Yu et al. 2016) . Costs fluctuated between − 7 and 26%, as there was a combination of errors in hospitalization forecasts and the errors associated with cost variations. The errors were greater for the costs due to them having been obtained from Eq. 1 (not by the neural network) and it was obtained by the average of all values paid for hospitalizations. The amount that the Brazilian government pays is constant per day of hospitalization; however, the amount presented on the DATA-SUS website does not show the number of days that patients were hospitalized, only the total amount paid for their hospitalization. Using only the PM 2.5 , the neural network adjustments presented by Polezer et al. (2018) were the same as those obtained in this work. This is due to the seasonality of gaseous emissions data, which is also influenced by air humidity, temperature, and pressure at the measurement site (Xu et al. 2019; Zhao et al. 2015) . In addition, Natali et al. (2011) showed that children and adolescents account for 40% of hospitalizations for respiratory diseases and that these diseases correspond to 30% of total hospitalizations in the city of São Paulo. So, as the reduction of gaseous emissions reduces hospitalizations for respiratory diseases, soon, hospital beds will be left to be used for other purposes. According to Chiquetto et al. (2019) over 2 million of poor people live in areas with pollutant gas levels above that recommended by the WHO. Thus, public policies aimed at improving air quality should be encouraged, in order to decrease air pollution. For Abhijith et al. (2017) and Janhäll (2015) , green walls and roofs on buildings and tree planting and tall vegetation on streets and roads must be used to improve air quality in cities. Thus, investment policies in green urban infrastructure should be adopted to minimize the effects of gaseous pollutants on human health in the city of São Paulo. According to Ravina et al. (2018) , the accounting of effects of air pollution on human health costs is a useful indicator to support decision and information at all management levels. So with the economic advantages of ecological cost accounting presented in this work, profits can be redirected towards the expansion or construction of new hospitals, increasing the capacity for hospital beds in the city of São Paulo (Gao et al. 2018; Santana et al. 2020; Saut et al. 2017) . This way, it showed a way in which governments can better plan their investments to minimize hospitalizations for respiratory diseases and increase the number of hospital beds available for other diseases such as COVID-19. The neural network that best fit the experimental data was RBF with 6 input neurons, 13 hidden layer neurons, and 1 output neuron, using BFGS algorithm and a Gaussian function to neuronal activation. This ANN was able to predict 85.57% of the experimental data and showed error below 2% and there were no significant differences between the means of the datasets by F test. Within the studied ranges for the gaseous pollutants, it is possible to predict hospitalization of 1464 to 3483 ± 510 patients, with costs between 570,447 and 1,357,151 ± 198,171 USD per month, respectively. So, knowing the monthly concentration of gaseous pollutions, the cost of admissions will also be foreseen and this information is a useful indicator to support decision and information at all management levels. The limitation of this work is the ranges of polluting gases emitted, as the neural network cannot guarantee good results if the input data are outside the range used in this research. This way, it is possible to use this neural network to predict the costs of hospitalizing patients for respiratory diseases and to contribute to the decision-making of how much the government should spend on health care. Avaliação do impacto na saúde da poluição do ar em São Paulo, Brasil Air pollution abatement performances of green infrastructure in open road and built-up street canyon environments -a review Chronic obstructive pulmonary disease and long-term exposure to trafficrelated air pollution: a cohort study Ensemble method based on artificial neural networks to estimate air pollution health risks Air pollution and the respiratory system Genetic algorithm applied to study of the economic viability of alcohol production from Cassava root from 2002 to 2013 Engineering, procurement and construction (EPC): what are the variables that impact the success of the projects currently running in Brazil Air pollution and mortality in São Paulo, Brazil: effects of multiple pollutants and analysis of susceptible populations Environmental Sanitation Technology Company of São Paulo, Brazil Air quality -Qualar Impacts of air pollution on human and ecosystem health, and implications for the National Emission Ceilings Directive: Insights from Italy Radial basis function networks Below-cloud scavenging by rain of atmospheric gases and particulates Public health co-benefits of greenhouse gas emissions reduction: a systematic review Traffic-related air pollution and the development of asthma and allergies during the first 8 years of life Risk of COPD from exposure to biomass smoke: A metaanalysis Review on urban vegetation and particle air pollution -deposition and dispersion Temporal relationship between air pollutants and hospital admissions for chronic obstructive pulmonary disease in Hong Kong Air quality and health Poluentes do ar e internações devido a doenças cardiovasculares em São José do Rio Preto Simulation and optimization of a biscuit processing production in an industrial scale by use of MLP and RBF neuro fuzzy network The relationship between aerosol particles chemical composition and optical properties to identify the biomass burning contribution to fine particles concentration: a case study for Sao Paulo city Analysis of the costs and logistics of biodiesel production from used cooking oil in the metropolitan region of Campinas (Brazil) Hospital admissions due to respiratory diseases in children and adolescents of São Paulo city Hospital morbidity due to diseases associated with air pollution in the city of Volta Redonda, Rio de Janeiro: cases and economic cost Isolated and synergistic effects of PM10 and average temperature on cardiovascular and respiratory mortality Assessing the impact of PM 2.5 on respiratory disease using artificial neural networks A bootstrapped neural network model applied to prediction of the biodegradation rate of reactive Black 5 dye DIDEM -an integrated model for comparative health damage costs calculation of air pollution Diesel exhaust exposure, its multi-system effects, and the effect of new technology diesel exhaust Applying of a neural network in effluent treatment simulation as a environmental solution for textile industry Development of colors with sustainability: a comparative study between dyeing of cotton with reactive and vat dyestuffs Applying of neural network on the wine sensorial analysis from Barbados Cherry Simulation of biodegradation process of wastewater from meat industry by means of a multilayer perceptron artificial neural network Effects of air pollution on human health and costs: current situation in São Paulo Exposure to fine particles increases blood pressure of hypertensive outdoor workers: a panel study Evaluating the impact of accreditation on Brazilian health care organizations: a quantitative study Wine clarification from Spondiasmombin L. pulp by hollow fiber membrane system Response surface methodology to evaluation the recovery of amylases by hollow fiber membrane Unorganized machines for seasonal streamflow series forecasting Accessed November WHO, World Health Organization. Air quality guidelines. Global update 2005. Particulate matter, ozone, nitrogen dioxide and sulfur dioxide Multimethod determination of the belowcloud wet scavenging coefficients of aerosols in Beijing Assessment of human health impact from exposure to multiple air pollutants in China based on satellite observations Below-cloud scavenging of aerosol particles by precipitation in a typical valley city, northwestern China The authors declare no competing interests.