key: cord-0895531-uczexjjx authors: Wang, Deyun; Yuan, Ying-an; Ben, Yawen; Luo, Hongyuan; Guo, Haixiang title: Long short-term memory neural network and improved particle swarm optimization–based modeling and scenario analysis for municipal solid waste generation in Shanghai, China date: 2022-05-14 journal: Environ Sci Pollut Res Int DOI: 10.1007/s11356-022-20438-0 sha: a87a5419d888fd73cc9e221ce960a1dacbb79bc1 doc_id: 895531 cord_uid: uczexjjx Accurate estimations of municipal solid waste (MSW) generation are vital to effective MSW management systems. While various single-point estimation approaches have been developed, the non-linearity and multiple site-specific influencing factors associated with MSW management systems make it challenging to forecast MSW generation quantities precisely. To address these concerns, this study developed a two-stage modeling and scenario analysis procedure for MSW generation and taking Shanghai as a test case demonstrated its viability. In the first stage, nine influencing factors were selected, and a hybrid novel forecasting model based on a long short-term memory neural network and an improved particle swarm optimization (IPSO-LSTM) was proposed for the forecasting of the MSW generation quantities, after which actual Shanghai data from 1980 to 2019 were used to test the performance. In the second stage, the future influencing variable values in different scenarios were predicted using an improved grey model, after which the predicted Shanghai MSW generation quantities from 2025 to 2035 were evaluated under various scenarios. It was found that (1) the proposed IPSO-LSTM had higher accuracy than the benchmark models; (2) the MSW generation quantities are expected to respectively increase to 9.971, 9.684, and 9.090 million tons by 2025 and 11.402, 11.285, and 10.240 by 2035 under the low, benchmark, and high scenarios; and (3) the MSW generation differences between the high and medium scenarios were decreasing. Due to China's rapid urbanization and economic development over the past few decades, municipal solid waste (MSW) generation quantities have increased significantly, making China the second largest global MSW producer behind the USA. MSW is heterogeneous comprising biodegradable waste, medical waste, electrical and electronic waste, and construction waste. Some waste, such as electronic products (phones, televisions, and computers) , are hazardous, do not degrade quickly, and therefore can have negative impacts on the environment and people's health. However, if MSW is disposed of correctly, it can be converted into valuable raw materials, such as gardening and agricultural fertilizer, energy, solid fuel, and higher-value byproducts (Liu et al., 2018) . Because of the lack of adequate MSW infrastructures in many places and especially in developing areas, MSW is often poorly managed, resulting in poor environmental protection and resource utilization. Adequate and sustainable waste management infrastructure planning depends on the ability to reliably estimate future MSW generation (Kolekar et al., 2016) , which requires a consideration of the expected demographic, social, and economic factor changes; therefore, accurate MSW forecasting is complex and challenging (Nguyen et al., 2021a) . MSW quantitative prediction methods can be divided into three main categories: causal relationship forecasting methods, time-series forecasting methods, and hybrid forecasting methods. Causal relationship forecasting methods based on correlation prediction principles require an approximate expression of the functional relationships between the influencing factors and MSW generation (Ghinea et al., 2016) . Therefore, accurate MSW generation predictions depend upon the selection of pertinent social and economic development influencing factors in the study area. Based on the quantitative interdependence relationships between the variables, the causal relationship forecasting MSW prediction methods can be divided into either linear forecasting models or nonlinear forecasting models. Multiple linear regression (MLR), which uses polynomials to reflect the functional relationships between various influencing factors and MSW generation, has been one of the most commonly used linear regression (LR) analysis models. Araiza-Aguilar et al. (2020) used MLR to explore the impact of economic, social, and demographic explanatory variables on MSW generation and found that the model had excellent prediction performances. However, the LR method prediction accuracy highly depends on the rationality and comprehensiveness of the selected indicators, and as the MSW generation system is complex, multivariable, and nonlinear, linear functional relationships are generally unable to adequately describe the relationships between the independent and dependent variables. The highly nonlinear characteristics of the variables also limit forecasting accuracy and any extensive application of LR models. As artificial intelligence (AI) models, such as the artificial neural network (ANN), the adaptive neuro-fuzzy inference system (ANFIS), decision trees (DT), support vector machines (SVM), and fuzzy logic (FL), can mimic the human reasoning process, they have been extensively used for highly nonlinear MSW generation systems (Abdallah et al., 2020) . ANNs, such as the radial basis function (RBF), multi-layer perceptron (MLP), and backward propagation (BP), have been the most commonly used AI models, followed by SVM. Despite their excellent prediction accuracy, ANNs have rigid input sequence constraints and disregard long-term impacts in their sequential data analysis (Niu et al., 2021) . To overcome these shortcomings, due to their effectiveness in extracting input series characteristics, deep learning approaches, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and attention algorithms, have been used for modeling and forecasting in recent years (Sun and Ge, 2021) . While RNNs have been typically employed to describe long-term span series nonlinear dynamic processes, they still suffer from gradient explosions and vanishing, which means that long sequential modeling has significant challenges (Yuan et al., 2020) . Hochreiter and Schmidhuber (Hochreiter and Schmidhuber, 1997 ) established a long short-term memory (LSTM) neural network to address these issues, which employed the RNN to efficiently memorize the historical input sequence evolution law and then during the training procedure used the distinctive LSTM three-gate structure and cell state to evaluate the temporal span influence when processing input sequences in batches to determine the duration of the impacts (Solano Meza et al., 2019) . Significant air quality (Yan et al., 2021) , wind power (Shahid et al., 2021) , and solar energy consumption (Nguyen et al., 2021b) research has applied the LSTM neural network to causal relationship analysis; however, few studies have employed it for MSW generation forecasting. Time-series models, which employ historical data statistical analysis on the dependent variable to determine the variation patterns (Chang and Lin, 1997) , have been successfully applied to forecast the seasonal and periodic variations in MSW generation (Navarro-Esbrı́ et al., 2002) . There are three most often used time-series methods: conventional statistical methods, decomposition-ensemble-based approaches, and AI-based methods. ARIMA, a conventional time-series analytical approach, has been widely employed in MSW generation prediction because of its sophisticated mathematical theory and its capacity to depict data reproducibility (Chao, 2008) . The decomposition-ensemble-based approach, which was inspired by the "divide and conquer" concept, involves time-series decomposition, modal prediction, and integrated prediction (Yu et al., 2008) and has been successfully implemented to predict e-waste. For example, Wang et al. (2021) used variational modal decomposition (VMD) to decompose time-series data, exponential smoothing model (ESM) to forecast each time-frequency signal, and grey modeling (GM) to integrate each time-frequency signal, with the experimental findings demonstrating that the VMD-ESM-GM technique was able to accurately predict future e-waste fluctuation trends. In contrast to the highly nonlinear MSW generation systems described by AI regression models, the AI time-series models are able to describe the fluctuation trends in pure time series and anticipate the future. Sunayana and Kumar (2021) employed a nonlinear auto-regressive (NAR) neural network to forecast shortterm monthly MSW generation in Nagpur, India, until 2023. However, if there is no or poor MSW generation historical data, AI-based methods may not be effective. In general, hybrid models deal with integrated problems or perform diverse tasks (Abdallah et al., 2020) . There are three main hybrid model types: combined causal relationship forecasting methods, combined diverse time series approaches, and cross mixing approaches. Xu et al. (2013) developed a hybrid method that integrated the seasonal auto-regressive integrated moving average (SARIMA) and a GM(1,1) model to forecast month-scale, medium-term, and long-term MSW output, and then validated its performance using data from Xiamen, China. Chen and Chang (2000) proposed a hybrid forecasting approach based on grey theory, fuzzy theory, and system dynamics (GFM) and validated the model's efficiency using solid waste production 1985 to 1998 time-series data from Tainan, China, finding that the GFM method prediction accuracy was substantially higher than the single GM model. Taking 1990 to 2018 Shanghai MSW data as an example, Lin et al. (2021) selected 25 generation factors and then developed a hybrid forecasting method based on a one-dimensional CNN, LSTM, and attention mechanism to depict the highly nonlinear MSW generation system. However, these hybrid approaches have limitations. For example, because time series methods have poor fitting capacity on sharp change points, it is difficult to anticipate the MSW generation from the formation mechanism. Although the causal relationship forecasting approach can avoid these issues, its application to forecasting future MSW generation is restricted as it does not know the value of each influencing factor in the future, and even though current hybrid MSW generation forecasting studies combine different causal relationship and time-series methods, few studies have combined the two to address the implementation deficiencies and identify the complex MSW production causalities. Taking Shanghai as an example, which is a highly developed Chinese city with a large resident population, this study proposes a hybrid forecasting model to accurately predict the city's MSW generation from 2025 to 2035. First, an improved GM (1,1) is employed to effectively predict the various future influencing factors, after which an IPSO is used to optimize the LSTM hyper-parameters and obtain the prediction model that has the best fitting effect. Finally, a scenario analysis approach is adopted to evaluate the 2025 to 2035 MSW generation and provide realistic proposals for a viable environmental protection governance system in Shanghai. Therefore, the main contributions and innovations are as follows: (1) a novel combined adjustment strategy is proposed to improve the global search capacity and convergence efficiency of the conventional PSO algorithm; (2) a novel hybrid model (IPSO-LSTM) based on an improved PSO algorithm and the LSTM neural network is proposed to ensure precise MSW generation forecasting; (3) a novel improved GM(1,1), the IGM(1,1), is established for the influencing factor time series forecasting from 2025 to 2035; and (4) a scenario analysis method based on IPSO-LSTM and IGM(1,1) is proposed that makes up for the shortcomings in single time series forecasting or causal relationship forecasting methods, accounts for the diverse future MSW generation growth trends, and pays attention to the factors and coordination relationship analyses crucial to the system's development. The remainder of this paper is structured as follows: "Material and methods" details the proposed hybrid forecasting model construction process, "Proposed model performance test" compares and verifies the effectiveness of IPSO-LSTM and IGM(1,1), "Shanghai MSW generation scenario analyses" adopts the IGM(1,1) to predict the influencing factor values under different development prospects and accurately forecast the MSW generation in each IPSO-LSTM model scenario, and "Conclusions" provides some concluding remarks and valuable extensions to the current study. As Shanghai is a leading demonstration city for China's waste classification policy, its trash categorization and recycling management policies are needed to strengthen the city's image and soft power. However, because of Shanghai's increasing business activities, there has been a significant escalation in consumption, which in turn has led to a significant rise in MSW generation. Figure 1 shows the MSW generation on the Chinese mainland in 2019, from which it can be seen that Shanghai's waste generation is comparatively high. The Comprehensive Urban Planning of Shanghai (2017-2035) (CUPS) explicitly stated that by 2035, Shanghai is expected to be a socialist metropolis with an international impact. Therefore, accurate MSW generation forecasting in Shanghai over the medium to long term is vital for the planning and construction of related infrastructures, such as waste collection, transportation, and incineration; the establishment of scientific, comprehensive, environmental protection management systems; and ecological city sustainability. The data for this study were collected from Shanghai statistical yearbooks from the official Shanghai Bureau of Statistics website (http:// tjj. sh. gov. cn). MSW generation is impacted by multiple elements, with most MSW studies having focused on social, economic, and demographic influencing variables. Therefore, based on previous MSW generation studies (Kannangara et al., 2018; Liu et al., 2021; Nguyen et al., 2021a) (Fig. 2) , nine factors were selected: permanent resident population (PRP); the number of overseas tourists (NOT); gross domestic product (GDP); tertiary industry (TI); the consumer price index (CPI), the retail price index (RPI); urban per capita consumption expenditure (UPCCE); urban per capita disposable income (UPCDI); and the total investment in fixed assets (TIFA). The historical data and change trends for these nine factors are shown in Fig. 3 . A correlation test was separately conducted for the MSW generation, the results for which are shown in Table 1 , which shows that the Pearson's correlation coefficients between these nine influencing factors and MSW were greater than 0.8, which indicated that all were significantly associated with MSW and could be used to forecast MSW generation. Considering there exists a high correlation relationship between all explanatory variables (except RPI), which may cause the "Multicollinearity" phenomenon and may negatively influence the model's robustness and stability, this research has adopted L2 regularization (Deng et al. 2022) , also known as ridge regression, to settle this problem based on the problem scale and data characteristics. Figure 4 shows the MSW quantities in Shanghai from 1980 to 2019, from which it can be seen that there was a general growth tendency from 1.31 million tons in 1980 to 10.38 million tons in 2019. There was an unexceptional upsurge in 2000 and 2001, with no significant change in the relevant influencing factors in the same period, which was primarily the result of a government campaign that permitted the kitchen trash from the catering sector to be discharged into the MSW (Xiao et al., 2020) . Therefore, to ensure prediction accuracy, the Shanghai MSW data collected from the National Bureau of Statistics (http:// www. stats. gov. cn) in 2000 and 2001 were replaced. The LSTM neural network is a peculiar RNN that performs better than a conventional RNN in describing historical input sequence evolutionary rules. As shown in Fig. 5 , the key to the LSTM is the information transmission path throughout the cell state and the information addition and removal process. The cell state (long-term influence) is determined from the multiplication of the preceding and current cell states, which is calculated as where C t is the new state, f t is the forget gate output, C t−1 is the previous cell state, i t is the input gate output, and C t is the candidate state. The long-term influence is modulated by LSTM using a unique three-layer gate (forget gate, input gate, and output gate) structure. The forget gate determines the duration of the long-term effect and uses a Sigmoid activation function to project an input sequence between 0 and 1 to select the information to be retained from the previous cell state. The following is a description of the forget gate functions: where f t is the forget gate output, σ is the Sigmoid activation function, W f and b f donate the linear relationship coefficient and bias to the forget gate, X t is the input sequence, and h t−1 is the previous hidden state. The input gate adds the immediate state to the long-term effect by selecting the information to be updated using the Sigmoid function, after which the input sequence is compressed between − 1 and 1 using the tanh function. The input gate mathematical expressions are shown as follows: where W t and b i in formula (3) respectively are the input gate linear relationship coefficient and bias, and in formula (4), tanh donates the hyperbolic tangent activation function, and W c and b c are the candidate state linear relationship coefficient and bias. The output gate is responsible for determining whether the long-term effect should be considered and controls the output of the current cell state with the aid of the sigmoid and tanh activation functions, as follows: (2) (5) is the outcome output gate, and W o and b o are the output gate linear relationship coefficient and bias, and the h t in formula (6) is the new hidden state and C t is a new state. Therefore, LSTM overcomes the short-term memory influence because of the "three-tier gate" structural design, which allows for the information from the former time steps to be reflected in the current cell state. Particle swarm optimization (PSO), which was developed by Eberhart and Kennedy in 1995 based on bird hunting behavior, is a population-based heuristic method for addressing optimization problems (Marini and Walczak, 2015) . The PSO algorithm seeks an optimal solution by collaborating and sharing information with the individuals in the group, that is, it simulates a flock of birds by manufacturing a massless particle that only has speed and position attributes. The PSO is initialized as a set of random particles, with each particle separately searching for the optimal solution in the solution space. In each iteration, based on the current personal best (pbest) and current global best (gbest) values, the particle calculates its speed for the next moment, after which it updates its position. Assuming there are j particles in a given region, the (k + 1) velocity vector update formula and position vector update formula for the ith particle are as follows: where parameter w denotes the inertia weight, which illustrates the impact of the previous iteration's speed on the present iteration; the higher the value, the better the global optimization ability and vice versa; c 1 and c 2 are the individual learning and population learning coefficients that respectively indicate the local optimization and global optimization abilities, and r 1 and r 2 are random digits between 0 and 1. The traditional PSO algorithm has a premature convergence problem, that is, each particle moves to the current optimal local value while searching, which gradually diminishes the diversity of the particles, meaning that the particles cannot often move off this local optimal solution (Feng et al., 2021) . To resolve this issue, an improved PSO (IPSO) method is proposed by (1) employing a dynamic inertia weight adjustment approach based on the Versoria function and (2) incorporating an adaptive mutation operation in the iterative calculation process. The Schwefel 1.2 function was adopted in this study to validate the performance of the improved PSO algorithm. The details for these two improvements are given in the following. Step 1. Inertia weight adjustment based on the Versoria function The inertia weight is a constant in the conventional PSO algorithm. However, previous studies have shown that a dynamic inertia weight coefficient can achieve superior optimization performances because a higher inertia weight at the iteration commencement prevents the global search from slipping into a local optimization too early. However, before the end of the iterations, a reduced inertia weight should be employed to guarantee that the algorithm converges faster and does not bounce out of the optimal solution. For example, Shi et al. (Shi and Eberhart, 1998 ) adopted a dynamic i adjustment linear decreasing strategy approach. Although the inertia weight under this strategy is no longer constant, its change rate is still a constant, which implies that this algorithm has the same capacity as global optimization and local optimization but may not be able to achieve the global optimal solution. Chen et al. (2006) proposed a quadratic function-based approach for diminishing the inertia weight. However, although the inertia weight change rate based on this strategy changes continuously as the iteration times increase, the inertia weight change rate in the early stage is faster and then becomes slower in the later stage, which suggests that the algorithm may not only have low efficiency in local optimization but may also prematurely fall into a local optimal. Therefore, in consideration of the above defects, this study adopts a dynamic adjustment approach based on the Versoria function for the inertia weight, which is defined as follows: where parameter a is the radius of the adjacent circle of the function; the larger the a, the greater its descent rate between 0 and 1, and when parameter a is set to 1, it is the standard Versoria function. Considering the global optimization and convergence speed, the parameter a value in this paper is set at 2. Consequently, the following adaptive inertia weight adjustment formula is obtained: where w(i) is the inertia weight coefficient at the ith iteration, w max and w min are their respective maximum and minimum values, which are respectively set at 0.9 and 0.4 in this study, δ i is the current number of iterations, and δ max represents the total number of iteration. The Versoria function curve and the decreasing inertia weight curve are shown in Fig. 6 . It is obvious from Fig. 6 that the preliminary function change rate is slow, which is conducive to global optimization in the solution space, while the later change rate is 6 Versoria function and its decreasing curve rapid, which promotes local search efficiency; therefore, the algorithm has both excellent global optimization and high convergence efficiency. Step 2. Adaptive mutation operation To tackle the problem associated with the progressive decline in the variety of the particle population in the iterative computation, Gandelli et al. (2007) proposed a GA-PSO hybrid optimization algorithm based on various hybrid strategies (static, dynamic, alternating, and adaptive mutation), which were verified using multimodal benchmark problems. Therefore, based on these results, this paper adds an adaptive mutation operation to the PSO iteration process. The adaptive mutation probability is calculated as follows: where in (11), rand is a random number between 0 and 1, δ i is the current number of iterations, and δ max is the total number of iterations. Initially, the particles escape from the group optimization direction with a certain probability, and as the evolutionary algebra increases, the particle mutation likelihood progressively decreases. This operation effectively boosts the population diversity and mitigates the risk of particles falling into a local optimum solution. The Schwefel 1.2 function (Yaghoobi and Esmaeili, 2017) is utilized as the benchmark function to assess the efficiency of IPSO in this work. As its independent variable has an epistatic effect, its gradient direction does not change along the axis, resulting in high optimization complexity. The independent variable for this function ranges from − 100 to (11) rand > 1 2 1 + i max 100. When x * takes 0, the function has a global minimum f (x * ) = 0: To evaluate the optimization performance of the PSO and IPSO, several initialization particle swarm parameters were adjusted to be the same and then iterated 1000 times, with their respective fitness function values compared at the end of the iteration. As shown in Fig. 7 , the PSO reaches local optimization before the 700th iteration, the PSO and IPSO have the same optimization efficiency around the 920th iteration, and after that, the IPSO is still approaching the global minimum. Although the IPSO has yet to converge after 1000 iterations, the global optimization capability of the IPSO is still superior, demonstrating the effectiveness of the improvements over the standard PSO. As the LSTM layers contain several hyperparameters, such as the learning rate, the number of hidden units, and the epoch times, any slight changes to certain parameters may result in substantial differences in the fitting outcomes. Because there is a huge number of combinations of hyperparameter variables, it is challenging to identify the best answers for an MSW generation forecasting model based on deep learning methods (Kim and Cho, 2021) . Consequently, this study optimizes the forecasting model's performance using the IPSO algorithm and selects the mean square error reciprocal for the training set as the fitness function, which automatically searches the hyperparameter values for the LSTM neural network using iterative calculation and information sharing between the populations. The IPSO-LSTM model parameter settings are divided into two parts: the IPSO parameter initialization and the LSTM parameter search interval. The specific parameter settings are shown in Table 2 . In Table 2 , for the IPSO parameters, PN denotes the population size for the proposed algorithm, MAXITER is the number of iterations, c 1 and c 2 are respectively the individual learning factor and group learning factor, r 1 and r 2 are random numbers ranging from 0 to 1, and w min and w max are the minimum and maximum inertia weight values. For the LSTM parameters, LR is the learning rate search range, EPOCH is the search range for the neural network iterations, and NFHL and NSHL respectively indicate the search ranges for the number of nodes in the first and second hidden layers. The Grey system theory, which was proposed by Deng in 1981 (Wang et al., 2010 , can utilize a small quantity of data and imperfect information to anticipate and regulate a system's future trends and situations. As the GM (1,1) requires a small amount of data, is easy to operate, and obtains good forecast performances, it has become the most commonly used grey forecasting model and has been widely used in various fields of study because of its excellent prediction performances. The fundamental idea of the GM (1,1) model is to test the stationarity of the original series, accumulate and generate new series to weaken the randomness, and establish corresponding differential equation models to reveal the characteristic laws for the new series so that an orderly sequence analysis can be performed and the predictions can be fulfilled. However, the predictive precision of the original GM (1,1) model needs to be enhanced because errors can originate from the background value, the initial value selection, and the data stationarity (Liu et al., 2014) . Therefore, to minimize the errors associated with the background value optimization strategy, this study conducted a translation transformation on the original sequence to enhance its stability and diminish the evident modeling process system errors. Translation transformations test the known sequences to ensure conformity in the modeling method feasibility. Set the original sequence as x (0) = x 0 (1), x 0 (2), ⋯ , x 0 (n) , and then calculate the stepwise ratio: This sequence can then be used for grey prediction if all stepwise ratios (k) fall within the allowable coverage Θ = e − 2 n+1 , e 2 n+1 . Otherwise, it needs to be further transformed to improve the forecasting accuracy. In this experiment, an appropriate constant c was added to each original sequence number to ensure it passed the stepwise ratio test: The GM (1,1) time response formula demonstrates that the model's prediction accuracy is highly reliant on the development coefficient (a) and gray control variable (b) calculations, which are contingent on precise background value calculations. However, one of the prime causes of conventional GM (1,1) inaccuracy is approximate background value calculations (Xu et al., 2015) . The background value optimization method based on error minimization proposed by Xu et al. (2015) provides a direction for decreasing these system errors. Therefore, based on this idea, the integral mean value theorem was applied to the real background value, and the least square method was used to estimate the grey differential equation parameters and ensure that the background value had both unbiased and minimum errors. Based on the above analysis, a linear combination background value construction form is adopted; therefore, the GM (1,1) gray differential equation model was amended as follows: where x 0 is the original sequence, x (1) is the sequence generated by its accumulation, a is the development coefficient, and b is the gray control variable. Therefore, r = (a * , b * ) T are the parameters to be solved in the model, which can be obtained using the least-squares method. The estimated a, b, and α parameter values using the least-squares method are: Substituting the above parameters into the whitening differential equation yields the discrete time response formula: After the progressive reduction, the forecast sequence is obtained as follows: To investigate the potential evolutionary impact of the selected factors under different policy orientations and social development backgrounds on future MSW generation trends, a scenario analysis method based on a hybrid forecasting model was adopted to predict the MSW generation in Shanghai from 2025 to 2035. As shown in Fig. 8 , the hybrid forecasting model proposed in this paper was divided (18) x (0) (k + 1) =x (1) (k + 1) −x (1) (k) Fig. 8 Forecasting model construction into three phases: forecasting the input variables, constructing the causal relationship forecasting model, and scenario forecasting. The first phase considered the future socio-economic development possibilities, for which the growth trends for each influencing factor were divided into three scenarios and the economic and social factors predicted using the improved GM (1,1) . For the medium scenario, the index growth was based on normal economic and social development. As the CUPS established a defined plan for the permanent resident population and the number of overseas tourists, the demographic factors are expected to rise linearly and steadily to a peak. The development trends for each influencing factor under the high and low scenarios were calculated based on the medium scenario to demonstrate the growth trends under various circumstances, such as diversified policy orientations, socio-economic development, and demographic changes. For the second phase, a causal relationship forecasting model was developed based on the IPSO-LSTM, which was then employed to fit the highly nonlinear relationships between the nine influencing factors and the MSW generation. As the LSTM neural network has many hyperparameters to be adjusted, manual parameter adjustment would not obtain an optimal parameter set; therefore, the IPSO algorithm was adopted to search for the hyperparameter combinations that minimized the sum of mean squared error (MSE) and L2 regularization term of the training set. If the mean absolute percentage error (MAPE) of the test set fulfills the assessment requirements given in the literature (Table 3) , it indicates that the model performance is excellent and can be utilized for subsequent multifactor prediction. Finally, in the third phase, with the predicted values for the influencing factors determined as the input of IPSO-LSTM model, the MSW generation predictions under the three scenarios were determined, and the three-phase causal relationship forecasting method based on the IPSO-LSTM was constructed. The prediction model organically integrates the time series prediction method with the causal relationship forecasting method to alleviate the single prediction method limitations, considers the possible diverse MSW future generation growth trends, and pays attention to the factors and coordination relationship analyses that are crucial to the system's development. Based on the real-world Shanghai data from 1980 to 2019, the optimized LSTM neural network hyperparameters, LR, EPOCH, NFHL, and NSHL, are shown in Table 4 . To test the performance of this hyperparameter combination, the following five evaluation metrics are proposed. To evaluate the IPSO-LSTM forecasting accuracy, mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute error (MAE), the coefficient of determination ( R 2 ), and the Theil inequality coefficient (TIC) were chosen as the assessment indicators, the relevant formulas for which were as follows: where y i was the actual value, ŷ i was the forecast, and y i was the average value of the dataset. MAPE, R 2 , and TIC generally vary from 0 to 1, with the closer R 2 is to 1, the greater the trend becomes between the forecast value and the actual value, and the lower the TIC value, the smaller the gap between the forecast and the actual values, indicating better forecasting precision. The RMSE and MAE range from 0 to positive infinity, with the smaller the RMSE and MAE values, the better. For MAPE, Hsu and Wang (2007) proposed the corresponding assessment approach: The data from the first 32 years were taken as the training set and the subsequent 8 years were taken as the test set. These were then respectively fed into the IPSO-LSTM model to determine the prediction results, which are shown in Fig. 9 and Table 5 . The iterative training set and the test set operations converged the errors and determined the optimal solution. The evaluation metrics also found that the model had an excellent fitting ability and generalization capacity, particularly at the mutation points, which meant that based on the future influencing factors, it could be applied to predict future MSW generation. Back propagation neural network (BPNN) and support vector regression (SVR) are two widely used models in the MSW forecasting area; thus, to further verify the performance of the proposed hybrid model, the following two comparisons are performed: (1) the first comparison is conducted to show the good performance of LSTM model compared to BPNN and SVR methods, and (2) the second comparison is performed to investigate the influence of IPSO on the LSTM neural network. Table 6 shows the error outcomes, and Fig. 10 shows the error radar plots for the five distinct methods, in which the greater the comprehensive forecast effectiveness, the smaller the region in the radar graphic. As can be seen, the LSTM prediction accuracy was higher than the BPNN and SVR, which verified that compared with traditional neural networks and the other nonlinear regression methods, the LSTM has excellent generalization ability and can fit highly nonlinear systems. The IPSO-LSTM precision was found to be better than the LSTM precision, which suggested that the IPSO optimized hyper-parameters allow the LSTM to thoroughly exploit its superior prediction ability. Posterior error detection method criteria including small error probability (p) and the post-error ratio (C); were employed to assess the GM (1,1) accuracy. Under the assumption that S 1 and S 2 were the respective standard deviations for the original sequence X (0) and the error sequence (0) , S 1 and S 2 were calculated as follows: The small error probability (p) and post-error ratio (C) were then calculated as follows respectively: Fig. 9 Performance of IPSO-LSTM model: a Loss curve of the training set and test set; b Overall prediction effect where the p-value was the error fraction within the acceptable range, and the C value was the change rate for the difference between the predicted and the actual values (Yousuf et al., 2021) . Therefore, the greater the p-value and the lower the C value, the better the forecasting performance. The acceptable p and C levels are shown in Table 7 . To verify its effectiveness, the IGM(1,1) was then compared with the original GM (1,1) based on the above evaluation metrics, with the results shown in Table 8 , which shows that the IGM(1,1) has good prediction error acceptability and only a small gap between the actual and forecast values. Scenario analysis is a qualitative prediction approach that assesses the prediction object based on the assumption that a specific trend is expected to continue in the future . The scenario analysis approach which considers as many alternatives as possible is more feasible than single-point forecasting methods and provides decision-makers with comprehensive decision-making schemes. Therefore, scenario analysis has been widely used for energy and environmental modeling. As this study intended to simulate the impact of multiple influencing factors on MSW generation under different growth tendencies, the scenario analysis needed to adequately address the probable social, economic, and demographic developments. Given this context, the Shanghai MSW generation from 2025 to 2035 was predicted for three separate scenarios. As the CUPS explicitly stated that Shanghai's permanent residents would be controlled at 25 million and the number of overseas tourists was predicted to be 14 million by 2035, these two indicators were independently analyzed. The improved GM (1,1) was then applied to evaluate the additional factors not expressly noted in this plan. The scenario definitions, scenario descriptions, and the definition basis were as follows: I. Medium scenario The medium scenario is constructed based on the national 14th Five-Year Plan and the CUPS. In this scenario, it is expected that Shanghai would strictly control its urban scale and control the resident population growth to 25 million. It would also exploit its urban tourist resources to nurture and develop high-quality tourism service functions to become a world-renowned tourism destination city, with the number of overseas tourists increasing to 14 million by 2035. As the core city of the world-class Yangtze River Delta urban agglomeration, Shanghai would take full advantage of its development potential and aggressively explore new economic models. Therefore, the GDP, TI, CPI, RPI, UPCCE, and UPCDI economic factors would generate new growth vitality. Socially, Shanghai would seek to improve people's livelihoods by expanding their livelihood benefits and continuing to steadily increase its investment in urban infrastructure construction. Under this medium scenario, the improved GM (1,1) was used to anticipate the II. Low scenario Demographic factors have been identified as the primary contributors to MSW generation (Ayeleru et al., 2021) . However, in the next 25 years, China is expected to become an aging society at a faster pace than Japan and some western developed countries (Chen et al., 2019) , which means there would be a decrease in per capita consumption and the workforce proportion and a reduction in per capita MSW generation. It is also expected that the repercussions from the continuing COVID-19 pandemic would result in a sustained global economic downturn, which means that Shanghai's economic development would inevitably suffer because of its status as the free trade port with one of the world's largest import and export quantities. The post-COVID-19 era consumption psychology would also lead to more conservative consumption behavior (Ren and Guo, 2020) . In this context, the development trends for the relevant influencing factors under this low scenario would have deteriorating trends and be under great pressure. The low scenario, therefore, sees the resident population grow to only 20 million, the number of overseas tourists grows to only around 10 million, and the other influential factors decline by 10% compared with the medium scenario. To promote the long-term balanced development of the population and optimize its fertility policy, China would fully liberalize its population policy by implementing a three-child fertility policy and corresponding supporting measures, which would alleviate the population aging, increase the workforce proportion and consumption capacity, and indirectly lead to a rise in the per capita MSW generation. China's 14th Five-Year Plan and its long-term objectives for 2035 explicitly stated that it would accelerate the establishment of a "dual circulation" development pattern in which the domestic economic cycle plays a leading role while the international economic cycle remains its extension and supplement. As the metropolis at the forefront of economic aggregate and comprehensive urban functions, to advance people's livelihoods, Shanghai would deepen reforms, optimize its economic structure, execute a high-level opening to the outside world, strengthen its public service system, and enhance its urban management system. As the associated infrastructure construction would be enhanced under this planning pattern, the overall fixed assets investment would increase. Therefore, under the high scenario, there would be a significant increase in the relevant influencing factors, with the resident population growing to 30 million, the number of foreign tourists growing to 20 million, and the other influencing factors being 10% greater than in the medium scenario. Based on the above three scenario development trends, the anticipated influencing factor values from 2025 to 2035 were respectively determined, the results for which are shown in Fig. 11 : The annual MSW generation from 2025 to 2035 was determined by inputting the forecast results for the annual influencing factors under the different scenarios into the IPSO-LSTM model, the results for which are shown in Fig. 12 . The projection outcomes showed that the MSW generation under all three scenarios would rise every year; however, the growth rate would be somewhat slower than the historical trend in the recent 5 years, which could be because of the strict control of Shanghai's urban scale and resident population. It was also observed from Fig. 13 that the MSW generation gap between the high and medium scenarios was decreasing each year, which indicated that although the primary influencing factor, population growth, would be limited by the urban scale planning, the continuous economic and social development would result in higher personal consumption levels and social service capacity, which would indirectly lead to an increase in MSW generation. Because the analysis indicated that the MSW generation under the medium scenario would move closer to the high scenario over time, to achieve its long-term goal of harmonious coexistence between humans and nature and the building of an ecologically sound, socialist modern international metropolis with world influence by 2035, to deal with a possible environmental waste crisis in the future, Shanghai would need to take the following whole life cycle 1. Strengthen citizen awareness of waste categorization and increase investment in municipal infrastructure construction. A crucial requirement for subsequent resource utilization is effective waste categorization. The government and social institutions should adopt multiple information channels, such as television, internet, school education, and advertising, to raise public waste classification, recycling, and green habit awareness. Because the prediction results indicated that the MSW generation under the medium scenario would approach that of the high scenario, the planning and construction of municipal infrastructures, such as waste transfer stations, recycling bins, and landfill sites, need to be increased to alleviate the pressure on the current waste recycling and treatment systems and improve overall MSW disposal efficiency. 2. Realize efficient MSW management by integrating digitization and informatization. Digitization and informatization can coordinate and connect materials and information flow using automated platforms and the Internet of Things (IoT) to categorize, sort, or recycle consumer and business waste. Therefore, to ensure more sustainable resource consumption, developing a large city-wide data hub could increase industrial productivity, minimize overproduction, decrease waste at the generation source, and also allow for the more effective recovery and recycling of MSW secondary materials. 3. Adopt emerging technologies for waste-to-energy and waste-to-material. Shanghai's whole life MSW treatment cycle should adopt sophisticated and mature technologies, such as material recovery with composting, for resource utilization and final stage harmless treatment to decrease its environmental impact. Compared with landfill disposal, which has been widely accepted and applied, thermal treatments such as incineration, gasification, and pyrolysis and biological treatments such as aerobic composting and anaerobic digestion not only have less secondary pollution but can also convert MSW into fertilizer, biogas, biofuel, electricity, and other resources. To achieve its long-term goal of becoming an outstanding global city and a socialist modern international metropolis with global influence by 2035, Shanghai must coordinate its municipal infrastructures and modernize its comprehensive solid waste treatment capacity; however, these goals are reliant on accurate MSW generation forecasts. To fully consider Shanghai's urban, economic, and social development plans, three possible future scenarios were developed, to which the proposed IPSO-LSTM model was applied to predict the future MSW generation and provide suggestions for the development of an effective future MSW treatment system in Shanghai. The model analysis revealed that the IPSO significantly enhanced the fitting performance of the LSTM neural network, especially at the mutation points, and compared with the LSTM, BPNN, and SVR reference models, the IPSO-LSTM was found to have better prediction performances. It was also found that the proposed IGM(1,1) employed in this research outperformed the original GM(1,1) in terms of prediction performances and medium-and long-term prediction efficiency. Finally, compared with single time series forecasting or causal relationship forecasting methods, as the IPSO-LSTM and IGM(1,1) scenario prediction method proposed in this paper accounted for the possible diverse future MSW generation growth trends by analyzing the factor coordination relationships crucial to the MSW system development, it overcame the shortcomings associated with single methods. The experimental results showed that MSW generation from 2025 to 2035 would continue to increase under all three scenarios, with the maximum generation range being 10.24034 million tons to 11.40179 million tons in 2035. Currently, there are 11 waste treatment facilities of different specifications in Shanghai, which together have a daily treatment capacity of 1.39225 million tons and an annual treatment capacity of around 508.2 million tons. Therefore, by 2035, the existing waste treatment system would need to operate at 2.22 to 2.24 times higher than the designed scale, which would accelerate equipment depreciation and pose a significant threat to the safe waste treatment system operations. Therefore, to decrease the MSW generation, greater investment is needed in the waste treatment system, the public's waste categorization awareness must be raised, Fig. 13 The MSW generation gap between different scenarios and informatization and digitalization must be incorporated into the waste management systems. This study provided a novel MSW generation forecasting method for authorities and academics to reveal the longterm MSW generation implications, which has rarely been examined in previous literature and can hopefully inspire further research in the field. However, the limitations of this work were the possible dataset heterogeneity and the lack of data from lower administrative units. Future research could conduct surveys to collect additional MSW generation data from lower residential regions to complement or validate this algorithm and comparison MSW management strategy studies between Shanghai and other international metropolises. Artificial intelligence applications in solid waste management: a systematic research review Forecast generation model of municipal solid waste using multiple linear regression Forecasting municipal solid waste quantity using artificial neural network and supported vector machine techniques: a case study of Johannesburg, South Africa An analysis of recycling impacts on solid waste generation by time series intervention modeling Time series analysis of the effects of refuse collection on recycling: Taiwan's "Keep Trash Off the Ground" measure Large scale protein profiling by combination of protein fractionation and multidimensional protein identification technology (MudPIT)* Prediction analysis of solid waste generation based on grey fuzzy dynamic modeling Protein post-translational modification site prediction using deep learning Trajectory control of electrohydraulic position servo system using improved PSO-PID controller Development and validation of different hybridization strategies between GA and PSO Forecasting municipal solid waste generation using prognostic tools and regression analysis Long short-term memory Forecasting the output of integrated circuit industry using a grey model improved by the Bayesian analysis Modeling and prediction of regional municipal solid waste generation and diversion in Canada using machine learning approaches Optimizing CNN-LSTM neural networks with PSO for anomalous query access control A review on prediction of municipal solid waste generation models Estimation of municipal solid waste amount based on one-dimension convolutional neural network and long short-term memory with attention mechanism model: a case study of Shanghai Demand gap analysis of municipal solid waste landfill in Beijing: based on the municipal solid waste generation Tourism flows prediction based on an improved grey GM(1,1) model Alkali-treated incineration bottom ash as supplementary cementitious materials Particle swarm optimization (PSO). A tutorial Time series analysis and forecasting techniques for municipal solid waste management A new method for forecasting energy output of a large-scale solar power plant based on long short-term memory networks a case study in Vietnam Development of machine learning -based models to forecast solid waste generation in residential areas: a case study from Vietnam Detection of long-term effect in forecasting municipal solid waste using a long short-term memory neural network Public mental health in post-COVID-19 era A novel genetic LSTM model for wind power forecast A modified particle swarm optimizer Predictive analysis of urban waste generation for the city of Bogotá, Colombia, through the implementation of decision trees-based machine learning, support vector machines and artificial neural networks Deep learning for industrial KPI prediction: when ensemble learning meets semi-supervised data Forecasting of municipal solid waste generation using non-linear autoregressive (NAR) neural models Forecasting the electronic waste quantity with a decomposition-ensemble approach An approach to increase prediction precision of GM(1,1) model based on optimization of the initial condition An overview of the municipal solid waste management modes and innovations in Shanghai A hybrid procedure for MSW generation forecasting at multiple time scales in Xiamen City, China Optimization method of background value in GM(1,1) model based on least error An improved artificial bee colony algorithm for global numerical optimisation Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering A modified GM(1,1) model to accurately predict wind speed Forecasting crude oil price with an EMDbased neural network ensemble learning paradigm Nonlinear dynamic soft sensor modeling with supervised long short-term memory network Predicting the elderly's quality of life based on dynamic neighborhood environment under diverse scenarios: an integrated approach of ANN, scenario analysis and Monte Carlo method The datasets used and analyzed in this study are available from Shanghai statistical yearbooks (http:// tjj. sh. gov. cn) and the National statistical yearbooks (http:// www. stats. gov. cn). Ethical approval Not applicable.Consent to participate All authors were participated in this work. All authors allow the publication of this paper. The authors declare no competing interests. Author contribution Deyun Wang: conceptualization, methodology, software, validation, formal analysis, supervision, writing-original draft, writing-review and editing. Ying-an Yuan: methodology, data collection and curation, software, validation, formal analysis, visualization, writing-original draft, writing-review and editing. Yawen Ben: conceptualization, validation, writing-review and editing. Hongyuan Luo: writing-review and editing. Haixiang Guo: writing-review and editing.