key: cord-0331606-y1rvkto7 authors: Yang, Siyue; Bao, Yukun title: Comprehensive learning particle swarm optimization enabled modeling framework for multi-step-ahead influenza prediction date: 2021-10-27 journal: nan DOI: 10.1016/j.asoc.2021.107994 sha: 8eaf5495d423b41803cc6c6e03d294702a45774c doc_id: 331606 cord_uid: y1rvkto7 Epidemics of influenza are major public health concerns. Since influenza prediction always relies on the weekly clinical or laboratory surveillance data, typically the weekly Influenza-like illness (ILI) rate series, accurate multi-step-ahead influenza predictions using ILI series is of great importance, especially, to the potential coming influenza outbreaks. This study proposes Comprehensive Learning Particle Swarm Optimization based Machine Learning (CLPSO-ML) framework incorporating support vector regression (SVR) and multilayer perceptron (MLP) for multi-step-ahead influenza prediction. A comprehensive examination and comparison of the performance and potential of three commonly used multi-step-ahead prediction modeling strategies, including iterated strategy, direct strategy and multiple-input multiple-output (MIMO) strategy, was conducted using the weekly ILI rate series from both the Southern and Northern China. The results show that: (1) The MIMO strategy achieves the best multi-step-ahead prediction, and is potentially more adaptive for longer horizon; (2) The iterated strategy demonstrates special potentials for deriving the least time difference between the occurrence of the predicted peak value and the true peak value of an influenza outbreak; (3) For ILI in the Northern China, SVR model implemented with MIMO strategy performs best, and SVR with iterated strategy also shows remarkable performance especially during outbreak periods; while for ILI in the Southern China, both SVR and MLP models with MIMO strategy have competitive prediction performance prediction, and is potentially more adaptive for longer horizon; 2) The iterated strategy demonstrates special potentials for deriving the least time difference between the occurrence of the predicted peak value and the true peak value of an influenza outbreak; 3) For ILI in the Northern China, SVR model implemented with MIMO strategy performs best, and SVR with iterated strategy also shows remarkable performance especially for accurate prediction during outbreak periods; while for ILI in the Southern China, both SVR and MLP models with MIMO strategy have competitive prediction performance. Seasonal influenza is an acute respiratory infection caused by influenza viruses, which circulates worldwide and remains a serious public health problem. According to World Health Organization, influenza epidemics are estimated to cause about 3 to 5 million severe illness cases and about 290,000 to 650,000 respiratory deaths each year. During outbreak seasons, hospitals and public disease control departments are under huge pressure to take countermeasures such as medical resources allocation and vaccination campaigns. The decisions and plans above are made mainly on the basis of clinical and laboratory surveillance data issued by local and national centers of disease control, typically weekly influenza-like illness (ILI) rate series. Nevertheless, the released surveillance data intrinsically describes influenza occurred in the past, limiting their utility for public health decision making [1] . Considering the annually distinct timing and intensity of influenza epidemics, accurate prediction of influenza and its outbreak with longer time horizon in advance could provide reliable epidemic signals for public health response. As mentioned in [2] , improving influenza prediction continues to be a central priority of global health preparedness efforts. A variety of influenza prediction approaches have been summarized and evaluated in literature reviews [3] [4] [5] and in the documentary works of the Centers for Disease Control and Prevention (CDC) Challenges and FluSight projects [1, 2] . Time series models, which predict the trend of influenza in the future through analyzing the potential temporal relationships and outbreak patterns of historical data, remain dominant in extant researches. Traditional statistical time series models widely applied to forecast the influenza, include Autoregressive integrated moving average (ARIMA) [6] [7] [8] [9] [10] [11] , Generalized linear model (GLM) [7, 12, 13] , Least absolute shrinkage and selection operator (LASSO) [7, 8, 14] and other types of regression models [15] [16] [17] . Due to the boom of data-driven technology, machine learning models have been employed in time series forecasting, demonstrating the superiority in modeling complex non-linear relationships between target and dependent variables [18] . As for influenza prediction, Support vector regression (SVR) [11, 13, 19, 20] , Random forest (RF) [11, 13, 21] , Gradient boosting (GB) [11, 13] and deep learning models such as Long short-term memory (LSTM) [9, 13, 19, [22] [23] [24] have been exploited in influenza prediction. In addition, ensemble approaches gathering multiple models become popular in recent influenza prediction research, including Bayesian model averaging (BMA) [7, 8] , stacking method [11, 25, 26] and other approaches [14, 27] . Combining multiple models has been proved to improve the comprehensive forecasting performance compared to single models in abovementioned studies. Early researches on influenza prediction mainly focused on one-step-ahead prediction, namely using all or some of the observations to estimate a variable of interest for the time-step immediately following the latest observation [28] . While in practice, influenza outbreak seasons always last more than one month, and thus, the longer the time span before influenza outbreak is predicted, the more sufficiently the preparation and arrangement for hospital resources can be undertaken. Therefore, multi-step-ahead prediction, namely predicting a sequence of values ahead the latest observation [29] , is necessarily required to support public health decision-making against influenza epidemics. In recent years, multi-step-ahead prediction has been taken into consideration in influenza prediction [9] [10] [11] 16, 23, 25, 26] . However, specific attention on different multi-step-ahead modelling strategies was rarely paid in prior researches when implementing influenza prediction models, and the dominating strategy by default was iterative only. As Taieb et al. [30] concluded, there should be at least three major modelling strategies for the multi-step-ahead prediction. In the iterated strategy (also called recursive strategy), multi-step-ahead prediction is conducted by training a one-step-ahead forecasting model and subsequently iterating the one-step-ahead forecast h times to achieve h-step-ahead prediction. In the direct strategy, h singleoutput forecasting models are separately trained for each step using the same observations as input variables, which leads to huge computational costs. As per the multiple input multiple output (MIMO) strategy, one multi-output model is trained to forecast the whole horizon in one shot, with predicted values in form of a vector rather than a scalar quantity. The MIMO strategy is available for machine learning models with intrinsic multiple output structure and unsuitable for most traditional linear regression models. As is studied in energy area [31] and with simulated data [28, 30] , different strategies could significantly influence the multi-step prediction performance with the same data sets, especially in the cases of machine learning models. On the other hand, most existing multi-step-ahead influenza prediction models are evaluated merely using averaged statistical metrics such as mean square error (MSE), root mean squared error (RMSE) and maximum absolute percent error (MAPE). In practice, accurate prediction aiming at the timing and severity of influenza outbreak peaks is highly concerned and quite important for decision-making. Thus, the prediction performance during influenza outbreak periods ought to be measured with close attention in order to fully evaluate the different multi-step-ahead prediction models at hands. In order to assure the high quality of multi-step-ahead prediction models with different strategies, model tuning process including feature selection and parameter optimization should be involved in the modeling framework. However, in extant researches on influenza prediction with machine learning (ML) models, the hyperparameters are mostly predefined rather than being optimized. As for feature selection, input features are mainly determined either following experience of experts or with some filter methods. For instance, Cheng et al. [11] identified the input features according to suggestions of experts in Taiwan CDC; Darwish et al. [13] proposed three types of feature space integrating time lags and first-order differences of weekly ILI rate series. An enumeration in small range was taken to find the best feature space. Liu et al. [22] calculated Pearson parametric correlation matrix between candidate features and ILI rate, and selected the features having significant correlation with the predicted variable. While in multi-step-ahead influenza prediction, even if the same ML model and candidate input features are applied to predict the same horizon of ILI rates, the training and predicting procedures differ among multi-step-ahead strategies, and thereby the optimum features and hyperparameters should be identified respectively to improve forecasting accuracy of each model implemented with each strategy. Such tasks could not be accomplished effectively and efficiently by the predefined hyperparameters or the filter methods selecting features merely. In recent years, hybrid modeling frameworks combining forecasting methods with optimization techniques for model tuning including feature selection and hyperparameters tuning have been widely exploited and proved to improve forecasting accuracy significantly and efficiently in many domains such as energy, economics, engineering as well as disease detection [32] [33] [34] [35] [36] [37] [38] [39] . For example, Kumar and Susan [32] built a hybrid modeling framework using fuzzy time series (FTS) forecasting and particle swarm optimization (PSO) algorithm to predict COVID-19 pandemic, where PSO was applied to optimize the values of three hyperparameters for FTS. Two specific frameworks "nested FTS-PSO" and "exhaustive search FTS-PSO" were proposed and examined on the dataset of coronavirus confirmed cases from 10 countries, and the exhaustive search FTS-PSO performed best. Altan et al. [33] developed a hybrid modeling framework for wind speed forecasting, combining long short-term memory (LSTM) network and decomposition methods with grey wolf optimizer (GWO), a swarm intelligence-based meta-heuristic algorithm to optimize the intrinsic mode function estimated outputs. The superiority of the proposed forecasting framework was validated on the data from five wind farms in the Marmara region, Turkey. Karasu et al. [34] proposed a forecasting framework for crude oil price based on support vector regression (SVR) with a wrapper method of feature selection using multi-objective PSO, considering both MAPE and Theil's U values as evaluation metrics. Zhang and Lim [35] constructed an ensemble transfer learning framework for optic disc segmentation with an improved PSO-based hyperparameter identification method to optimize the learning parameters in Mask R-CNN model. Evaluated on the Messidor and Drions image datasets, the proposed ensemble framework proved to significantly outperform other ensemble models integrated with original and advanced PSO variants. From previous studies, it can be found that PSO and its variants are successfully and frequently applied in plenty of existing model tuning tasks. However, the majority of extant solutions tended to treat parameter optimization and feature selection as two separate subtasks, and always focused on refining one of them in their proposed hybrid framework. In this study, to efficiently achieve the model tuning tasks in multi-step-ahead influenza prediction, we develop a unified adaptive modeling framework which is able to deal with parameter optimization and feature selection simultaneously in one optimization process, taking into account the mutual influence between choice of feature subsets and appropriate hyperparameters under different multi-step strategies. Since the unified process leads to increased complexity of the optimization problem and the traditional binary PSO was pointed out to suffer from trapping into local optimal solutions in feature selection [40] , we adopt Comprehensive Learning Particle Swarm Optimization (CLPSO) in binary form as optimizing algorithm. As a powerful variant of PSO, CLPSO preserves the diversity of swarm, discourages premature convergence and thus improves the global search ability of PSO. Meanwhile, validated by [41] that initially proposed this variant, CLPSO demonstrates significant superiority in solving multimodal problems, which indicates its fitness for complex optimization problems. In order to make original CLPSO fit in our modeling framework, we design a binary encoding approach for particles to represent the selected features and hyperparameters simultaneously, which is critical to realize the unified model tuning process. In a nutshell, the main aim of this study is to explore the performance and potential of machine learning models implemented with different multi-step-ahead modelling strategies in multi-step-ahead influenza prediction, which can provide a comprehensive and reliable reference for practitioners and researchers to applying machine learning models for multi-step-ahead influenza prediction in the future, and thus could further support the decision making of public disease control departments, hospitals, and pharmaceutical companies. The contributions of this work are summarized as follows:  This study firstly takes into account the implementation and comparison of different multi-step-ahead modelling strategies on multi-step-ahead influenza prediction with machine learning models. Although some extant researches have predicted influenza in a multi-step-ahead way using machine learning models with multi-output structures, few, if any, studies have paid attention to examine the diverse performance of a specific machine learning model with different multistep-ahead strategies. The rest of this paper are organized as follows. Section 2 provides a brief review of the literature on influenza prediction. In Section 3, multi-step-ahead prediction strategies, the selected two well-established machine learning models and the proposed CLPSO-ML modeling framework are illustrated in details. Section 4 describes the experiment design including data source, data preprocessing, experimental procedure, evaluation of model performance and other implementation details. Experimental results are discussed in Section 5. Finally, conclusions are drawn in Section 6. This section briefly reviews the literature on influenza prediction using statistical approaches including both traditional regression models and machine learning models. Table 1 provides an overview of those studies. In terms of prediction target (predicted variable), Influenza-Like Illness (ILI) rate series is regarded as the proxy of confirmed influenza cases and serves as the prediction target in the overwhelming majority of the extant researches, which is generally monitored and issued by local or national CDC in different countries. ILI refers to the case that a patient has fever (temperature of 100°F [37.8°C] or greater) and a cough and/or a sore throat without a known cause other than influenza. Among those studies, ILI of multiple spatial scales in the United States has been mostly considered, including level of nation [6, 10, 16, 17, 25, 26] , state [8, 22, 27] and region defined by Health and Human Services (HHS) [9, 10, 25, 26] . ILI in China was also predicted in many cases, mainly at the scale of city and province such as Guangzhou [24] , Shenzhen [21] , Hongkong [7] , Liaoning [20] and Taiwan [11] . However, larger regional scale of ILI in China is in lack of exploration in previous researches. In this study, ILI rate series from the Southern and Northern China issued by Chinese National Influenza Center are used as data sets. As per the prediction horizon, about half of the reviewed literature focused on nowcasting and one-step-ahead prediction. The rest of studies considered longer prediction horizon, namely multi-step-ahead prediction. It is mentioned in [30] that different multi-step-ahead modelling strategies can be suitable for different length of prediction horizon. For instance, it was discovered in [42] that iterated strategy beats the other two strategies for long-term flood forecasting with neural network model. Ghysels exzt al. [43] compared the direct strategy, the iterated strategy and a mixeddata sampling strategy for multi-step-ahead volatility forecasting, and found that iterated strategy performs best at short horizons while the mixed-data sampling strategy has an edge for longer horizons. However, previous researches on influenza prediction conducted multi-step ahead prediction merely with one predefined strategy (iterated strategy mostly) ignoring the comparison among different multi-step-ahead modelling strategies. And thus this study conducted an extensive experiment to evaluate the overall performance of multi-step-ahead modelling strategies across different time horizons ranging from two-step-ahead to ten-step-ahead for influenza prediction. As per evaluation metrics, almost all previous researches on multi-step-ahead influenza prediction did not have a close examination on the prediction performance during influenza outbreak periods but the overall time window before and after the outbreak Error (PWE, also called Week Difference) [7, 17] , which directly calculate the time difference and magnitude difference between the true peak value and the predicted peak value, were taken as the metrics in this present study. According to Chinese National Influenza Center, influenza viruses are extremely active in winter due to the weather condition, which provides a rough definition of outbreak seasons. Thus, influenza outbreak seasons are defined as the periods between the 45 th week in the current year and the 8 th week in the next year. Based on the definition of influenza outbreak season for ILI in China, in this study, PWE is adopted to measure the peak time difference, while Mean Absolute Error (MAE) is additionally calculated to evaluate the prediction accuracy of the whole outbreak period. In summary, different from the existing literature, this study contributes to the influenza prediction literature by concentrating on the application and comparison of different modelling strategies for multi-step-ahead influenza prediction based on machine learning models with ILI data in the Southern and Northern China, which has not yet been fully explored in previous researches. Furthermore, this study considers analyzing the fitness of different multi-step strategies under different length of prediction horizon, and additionally takes the influenza outbreak related accuracy metrics into account for experimental justification, both of which were ignored in extant multi-step-ahead influenza prediction researches. Notations are illustrated as follows in the context of influenza prediction at the beginning of this section. Given the weekly ILI rate series{ 1 , 2 , … , }, the current and previous observations are denoted as = [ , −1 , … , − +1 ] ∈ ℝ , which represents that the input features are the ILI rates from the week − + 1 to the current week , with the embedded dimension d. Whereas the future observation is denoted as = [ +1 , +2 , … , + ] ∈ ℝ , which represents that the predicted values are the future ILI rates from the week + 1 to the week + , with as the length of time horizon of prediction. In a word, multi-step-ahead influenza prediction aims to find the functional relationship between the current and previous ILI rates, namely x, and the future ILI rates, namely y. In this study, three modelling strategies are applied for multi-step-ahead influenza prediction, namely iterated strategy, direct strategy, and MIMO strategy. As there are various variations of the strategies with specific situations and models in extant literature, it is impossible and unnecessary to enumerate each variation. Thus, this section describes the standard version of each strategy. In the iterated strategy, a single model is trained by minimizing the error for onestep-ahead prediction as follows: Where denotes embedded dimension, and denotes the additive noise. For -step-ahead prediction, the one-step-ahead prediction is achieved using the trained model, and then the predicted value is fed into the trained one-step-ahead model when forecasting the second step after current time t. The above process is iterated to predict subsequent steps until reaching the step. The process above is formulated as follows: Where ̂ denotes the trained one-step-ahead model, and ℎ denotes the ℎth value in the output series, namely the ℎth step of the total horizon. Initially suggested by Cox [44] , direct strategy constructs a set of models to forecast each step independently with the same observations as input variables. In other words, models are trained for each step respectively and it can be formulated as follows: Where ℎ ∈ {1, … , } and denotes the additive noise. After the learning process, the estimation of the next values is given as follows: Where ̂ℎ denotes the trained single-output model corresponding to the ℎth step. In contrast to the above two strategies which model the data using single-output functions (see Eq. (1) and Eq. (3)), MIMO strategy is implemented using the function with multiple-input and multiple-output structure. The entire horizon is predicted using one multiple-output model in form of a vector of predicted values. MIMO strategy takes into account the existence of stochastic dependencies among future values, which is ignored in the iterated strategy and the direct strategy. In the MIMO strategy, one multiple-output model is trained from the ILI rate time series {I 1 , … , I N } as follows: Where :ℝ d → ℝ H denotes a vector-valued function, and ∈ ℝ H denotes the additive noise vector with a covariance that is not necessarily diagonal [45] . After the learning process, the estimation of the next values is given as follows: Where ̂ denotes the trained multiple-output model. The support vector regression (SVR) is a competitive approach deriving from the application of support vector machines for nonlinear regression and time series prediction problems. ε-SVR is adopted in this study for iterative strategy and direct strategy and briefly introduced as follows. Given a training dataset {( , )} =1 ⊂ ℝ × ℝ, the goal of ε-SVR is to find a function ( ) that has the deviation less than from the observed for all the training data. The errors can be accepted if less than , while it is not allowed when they are larger than [46] . denotes the d-dimensional input vector and denotes the corresponding target output. In the context of ILI rate forecasting, refers to ddimensional previous observation of ILI rate series { , … , − +1 } ; refers to the future value +1 . Given the form of ( ) as Eq. (7), the formulation of ε-SVR optimization problem can be written in Eq. (8) Where ⟨⋅,⋅⟩ denotes the dot product in the space ℝ ; , * are slack variables relaxing the linear constraints for nonlinear regression data to cope with infeasible constraints of the optimization. Regularization parameter controls the trade-off between the variance and bias of the objective function. Moreover, kernel function ( , ′ ) can be introduced to further solve nonlinear problems. As the most widely used kernel, Radial Basis Function kernel (RBF kernel, or Gaussian kernel) is adopted in this study, and the formulation of it is given as follows: Where = 1 2 2 is a parameter controlling the width of the kernel. In summary, three hyperparameters , and need to be tuned during the training process. Since SVR holds the structural risk minimization principle, namely minimizing an upper bound of the generalization error, it performs better in generalization than many other machine learning algorithms which adopt empirical risk minimization principle directly minimizing the training error [47, 48] . Despite the mentioned merits of SVR, the standard formulation of SVR is inappropriate for multi-step-ahead forecasting with MIMO strategy on account of its inherent single-output structure [28] . The solution to this limitation of SVR was initially proposed by Pérez-Cruz et al. [49] , designing a multi-dimensional SVR using a cost function with a hyper-spherical intensive zone. This new structure enables SVR to perform better in multi-dimensional forecasting tasks than predicting each dimension separately using standard SVR model. On this basis, Tuia et al. [50] constructed a multiple-output SVR model (MSVR) to estimate multiple biophysical parameters from remote sensing images simultaneously. Subsequent researches have explored and validated the effect of MSVR in diverse areas [50] [51] [52] . In this study, MSVR is applied as the implementation of MIMO strategy. Detailed information on MSVR can be found in [49, 50] . In Considering the structure of the output layer, MLP can be implemented with the MIMO strategy inherently. The hyperparameter to be optimized is the size of the hidden layer, namely the number of neurons in the hidden layer. The model tuning process including feature selection and hyperparameters optimization always plays a significant role in the forecasting tasks with ML models. Nevertheless, these two subtasks are dominantly executed separately in literature, which is time consuming and ignores the interactive influences between features and hyperparameters [53] . Addressing these issues, the following adaptive scheme is developed for the multi-step-ahead influenza prediction. Initialize the population and parameters of CLPSO Initialize a population of particles with random positions; 4 Evaluate each particle with the fitness and initialize the pbest and gbest; 5 While the maximum iteration time or the maximum time of failing to improve the solution has not been reached: 6 For each particle: 7 If it's the first generation or the particle ceases improving for generations more than the predefined refreshing gap: 8 Assign the exemplar particle for each dimension according to the detailed rules in 3.3.3; 9 Update the velocity with Eq.(11) and the position with Eq.(13); 10 Transform the particle into the features and hyperparameters by decoding the binary string; 11 Train the ML model for multi-step-ahead influenza forecasting with the selected features and hyperparameters; 12 Evaluate the particle taking mean squared error (MSE) of cross validation on the training set as fitness value; 13 Update the personal best solution pbest; 14 Update the global best solution gbest; 15 Use the optimum features and hyperparameters decoded from gbest to retrain the prediction model on the whole training set; 16 Apply the trained model on the test set and assess the prediction performance under different strategies. 17 End The unified process of model tuning including hyperparameter optimization and feature selection for the adopted machine learning (ML) models using binary CLPSO algorithm is shown in Fig. 1 . The left part of Fig. 1 demonstrates the executing procedure of the binary CLPSO optimization algorithm, which will be introduced in detail in Section 3.3.3. Each particle in the binary CLPSO algorithm represents candidate hyperparameters and input features of the ML models, which are encoded into a 0-1 series (See Section 3.3.2 for the specific encoding and decoding methods). The right part of Fig. 1 shows the training and testing process of the ML models. Firstly, ILI data are preprocessed and divided into training set and test set. During the training process, hyperparameters and selected features decoded from the particle are applied to train the model, and the performance of the trained model is evaluated as fitness value of the current particle. For each particle, the above training and evaluating process is repeated. In other words, the model is trained with different hyperparameters and features corresponding to each particle of every generation in CLPSO, until the termination condition is satisfied. After that, the best particle is reserved, comprising the optimal parameters and optimal selected features which will be subsequently exploited to retrain the influenza prediction model on the whole training set. Finally, the model with multi-step strategies are applied on the test set, and metrics of forecasting performance are calculated for the comparison of different strategies with specific ML model. Pseudocode of the above procedure is illustrated in Fig.2. In order to make CLPSO available for the unified optimization of feature selection and hyperparameter tuning, a binary encoding approach is designed to contain information of selected features and parameters in each particle. The particle in the binary CLPSO consists of two parts: the feature masks, and hyperparameters of the adopted ML models, specifically SVR and MLP in this context. In terms of the feature mask part, the 0 the 0-1 string in this part works as a whole, representing the index of the value in the candidate list by converting the binary string to decimal integer. Fig. 3 shows an intuitive example of the binary particle representation for SVR. This section briefly introduces the used binary CLPSO, and more detailed information about the original CLPSO algorithm can be found in [41] . CLPSO is a variant of PSO using a comprehensive learning strategy that the best position of other particle is applied to update the velocity of each particle. Population diversity is preserved and the ability of global search is improved under such mechanism. The velocity is update with the following equation: where denotes the velocity and denotes the position of the th particle in the th dimension; ∈ ℝ is the acceleration coefficient; is called inertia weight which is used to balance the global and local search abilities; is a random number where denotes size of swarm. Specifically, the index ( ) of the exemplar particle is determined as follows. For each dimension of particle , if a generated random number in the range [0,1] is greater than probability , the corresponding dimension will learn from its own, namely ( ) = ; otherwise, it will learn from another particle. To select another exemplar particle, the fitness values of two randomly selected particles are compared to find the smaller one as the exemplar, excluding the particle whose velocity is updated. If all the exemplars of each dimension are the particle itself, then a randomly selected dimension is forced to learn from the corresponding dimension of another particle's best previous position. Once the all exemplars of a particle have been selected, the particle is allowed to learn from these exemplars until the particle is not improved for generations ( is called the refreshing gap). In this study, the position of particle is expressed as a binary bit vector, and thereby the velocity is the probability for each dimension to change from 1 to 0 or inversely. The update equation of the position is as follows: Where rand is a uniformly distributed random number in [0,1], and S(⋅) is a sigmoid transfer function. Theoretically, the computational complexity of the binary CLPSO in the worst case is: Therefore, the computational complexity of the total framework will be acceptable in this study. Weekly influenza like illness (ILI) rates in the Southern and Northern China from the first week of 2010 to the 12th week of 2020 are collected from Influenza Weekly Report issued on the website of Chinese National Influenza Center. The issued ILI rate is calculated by the following formula: the Northern China ILI rate series (abbreviated as N_ILI) and the Southern China ILI rate series (abbreviated as S_ILI) are showed as follows. rate series with log base 10, which could make the size of the seasonal variation same across the whole series and therefore make the forecasting model simpler [54] . The transformation is reversed to achieve the predicted values of the original scale. Three multi-step-ahead strategies are applied in the experiment, namely iterated strategy, direct strategy, and MIMO strategy. Considering the inherent single-output property, the standard SVR can only be implemented with the iterated strategy and the direct strategy. When it comes to MIMO strategy, MSVR is adopted. Whereas MLP fits all the strategies since it allows multiple output inherently. To assess the forecasting performance of different model, two statistical metrics, MAPE (Mean Absolute Percentage Error) and RMSE (Root Mean Square Error), are used in the experiment. MAPE is scale independent and hence is frequently adopted to evaluate forecasting performance across different datasets. Nevertheless, its disadvantage lies on the heavier penalty on positive errors than on negative errors. Therefore, RMSE is also introduced to measure the accuracy of prediction. As a scale-dependent metric, RMSE can only be compared among different models for the same dataset [28] . The definitions of MAPE and RMSE are given as follows: Where denotes the number of fitted ILI rates, denotes the th observed ILI rate, and ̂ denotes the th predicted ILI rate. Besides the statistical metrics, in the context of influenza prediction, it is of great importance to accurately predict the situation during influenza outbreak periods. Before giving the definition of the metrics, it is necessary to define the influenza outbreak seasons of the whole ILI time series. As is discussed earlier, the specific rules mentioned in [14, 15] , which can be used to judge whether the ILI value of each time point belongs to influenza outbreak period or not, are not suitable for the adopted China weekly ILI rates. Therefore the outbreak periods in this study are roughly defined as the winter months from the 45 th week in this year to the 8 th week in the next year, referring to the description of Chinese National Influenza Center. In this case, two influenza outbreak metrics, the Peak Week Error (PWE) and the Outbreak MAE, are employed to assess the forecasting performance during influenza outbreak periods. For each specific outbreak season, an influenza peak is defined as the week of highest ILI rate in the outbreak period. the peak week error (PWE) of a given outbreak season is the absolute difference between the observed peak week and the forecasted peak week [17] , as given by Where denotes the observed week of the peak and ̂ denotes the predicted week of the peak. PWE is aimed at the single peak point of a given influenza outbreak season, while the Outbreak MAE is used to evaluate the forecasting performance across the whole outbreak period. Outbreak MAE is the abbreviation of the mean absolute error of a consecutive influenza outbreak period, as given by Where denotes the number of ILI rate values encompassed in a given outbreak season; denotes the t h observed ILI rate in the given outbreak season, and ̂ denotes the th predicted ILI rate in the given outbreak season. Three influenza outbreak seasons are included in the test samples of N_ILI and S_ILI. Scikit-learn v0.24.1 package of python is employed to build the standard RBF kernel SVR models taking iterated and direct strategies and MLP models taking all the three strategies. MSVR is rewritten in python according to the MATLAB version of program provided by Tuia et al [50] . Table 2 demonstrates the space of hyperparameters for SVR and MLP. ILI rates of previous 20 weeks are selected as the candidate input variables, i.e., = 20 . There are also some adjustable parameters in the CLPSO algorithm, which are determined in a fashion of trail-and-error considering the trade-off between prediction accuracy and computational time. Specifically, the termination condition is that the number of total iterations exceeds 200 or the global fitness value does not improve for decreases linearly from 0.9 to 0.4 with the increase of iterations; the number of particles in the swarm is set to 8; the refreshing gap is set to 8. The prediction performances of the two adopted machine learning models (SVR and MLP) across the three multi-step strategies (Iter, Dir and MIMO, as the abbreviation of the Iterated strategy, the Direct strategy and the MIMO strategy respectively in the following tables) in terms of two statistical metrics MAPE, RMSE and two influenza outbreak metrics PWE, Outbreak MAE over the prediction horizon ranging from 2 to 10 on the two weekly ILI rate series (i.e. N_ILI and S_ILI) are demonstrated in Table 3 and Table 4 respectively. Fig. 7 Considering the presented results on N_ILI, several observations can be drawn from 2) Iterated strategy is also competitive considering the above metrics since there is no significance between iterated and MIMO strategy in many cases, and even better especially for short length of horizons (H = 2,3). More importantly, the iterated strategy ranks the first over most of the horizons by PWE. 3) Comprehensively considering all the metrics, the horizon of 4 seems a turning point, where iterated strategy has an edge for shorter length of horizon (H = 2 and 3), while MIMO strategy is potentially more adaptive for longer horizon (H ≥ 4). As per the results in the case of S_ILI, one can deduce the following observation from In summary, the distinction between the performance of three modelling strategies are smaller on the Southern China ILI rates than the Northern China except for PWE. Despite of this, there is some common ground of the results for the both ILI series. 1) MIMO strategy can be identified as the most competitive multi-step-ahead modelling strategy, comprehensively considering all the accuracy metrics which evaluates the forecasting performance from different perspectives. 2) Horizon of 4 also acts as a turning point for the ranking of three strategies in most cases of RMSE, PWE and Outbreak MAE. The iterated strategy usually ranks well for short-term prediction(H = 2 and 3), while MIMO strategy has advantages over longer prediction horizons (H ≥ 4). 3) Iterated strategy derives the least PWE over the majority of horizons with MLP as well, though it loses to MIMO strategy implemented with SVR. The performance of different strategies demonstrated in the above results is in compliance with the theoretical characteristics of multi-step-ahead modelling strategies mentioned in previous literature [55, 56] . Specifically, the iterated strategy accumulates errors of the trained single-output model at each step [55] , which leads to relatively large prediction errors with the increase of forecasting horizon. However, if the ML model performs well for one-step ahead forecasting, this strategy can be quite useful for shortterm multi-step-ahead prediction, like H = 2 and 3 in this context. Moreover, considering the mechanism of iterated strategy, it can capture some possible future information by adopting the predicted value as part of input variable, and thus could be more sensitive to the appearance of outbreaks and peaks, which may result in its superior performance considering PWE. While MIMO strategy preserves the stochastic dependency among the predicted sequence of values, which contributes to modeling the underlying dynamics of the time series [28, 56] , and thereby can be more suitable for longer prediction horizons. Multi-step-ahead influenza prediction is of great significance on planning appropriate actions and timely responses to the potential influenza outbreaks and epidemics. However, prior researches seldom consider the different strategies for multi-step-ahead forecasting when implementing models for influenza prediction. Addressing this research gap, this study explores the performance of three basic multi-step-ahead modelling strategies (i.e. iterated strategy, direct strategy and MIMO strategy) implemented with machine learning models, using weekly ILI data of northern and southern China. In addition, to ensure the high quality of each model under each strategy, a CLPSO based modeling framework is developed to effectively and efficiently accomplish the model tuning tasks conducting parameter optimization and feature selection in a unified optimization process. In terms of performance evaluation, besides the commonly used statistical metrics such as MAPE and RMSE, the measurements aiming at the prediction performance during influenza outbreak seasons are also taken in order to fully analyze and assess the different strategies. Results of the experiments on two well-established machine learning methods, SVR and MLP, show that MIMO strategy demonstrates the best comprehensive forecasting performance over a wide range of prediction horizons while iterated strategy has advantages for decreasing the time error between the occurrence of predicted peak value and true peak value during an influenza outbreak period. Furthermore, different strategies are appropriate for the prediction horizons of different length. In the context of this study, several evaluation metrics indicate the similar trends that iterated strategy is superior for short-term influenza prediction over the horizons less than four, while MIMO strategy is more competitive for longer horizon. We hope this study could result in more considerations of applying multi-step-ahead modelling strategies and provide a reliable reference for the selection of appropriate strategy when conducting multi-step-ahead influenza prediction, which could further contribute to the decision-making in the domain of infectious disease control and medical management. As for the limitation of this study, the conclusions are drawn based on univariate prediction model, merely using the lags of ILI rate as input variables. Since the main purpose of this study is to analysis the distinction of prediction performance under different multi-step-ahead strategies, the results could be more reliable without the interference from exogenous variables which may have complex relationship with the predicted target. However, for the purpose of appropriately choosing multi-step-ahead strategies, future efforts could be made to explore the improvement and development of hybrid modeling frameworks for multi-step-ahead influenza prediction involving commonly used exogenous variables such as weather condition, search engine query and social media data. Optimization algorithms for model tuning tasks in the framework could be refined for more complex situations as well. Collaborative efforts to forecast seasonal influenza in the United States Yamana, others, A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States Influenza forecasting in human populations: a scoping review A systematic review of studies on forecasting the dynamics of influenza outbreaks, Influenza Other Respir A review of influenza detection and prediction through social networking sites Adaptive nowcasting of influenza outbreaks using Google searches Forecasting influenza in Hong Kong with Google search queries and statistical model fusion Complementing the power of deep learning with statistical model fusion: Probabilistic forecasting of influenza in Dallas County A Novel Data-Driven Model for Real-Time Influenza Forecasting Near-term forecasts of influenza-like illness Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study A comparative study on predicting influenza outbreaks using different feature spaces: application of influenza-like illness data from Early Warning Alert and Response System in Syria Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model Zhang, others, Forecasting influenza epidemics from multi-stream surveillance data in a subtropical city of China Using electronic health records and Internet search information for accurate influenza forecasting Optimal multi-source forecasting of seasonal influenza Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model Forecasting influenza-like illness dynamics for military populations using neural networks and social media Forecasting influenza epidemics by integrating internet search queries and traditional surveillance data with the support vector machine regression model in Liaoning Time series analysis of weekly influenza-like illness rate using a one-year period of factors in random forest regression LSTM Recurrent Neural Networks for Influenza Trends Prediction Multi-step prediction for influenza outbreak by an adjusted long short-term memory Attention-based recurrent neural network for influenza epidemic prediction Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the US Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches Multi-step-ahead time series prediction using multipleoutput support vector regression Time series prediction: forecasting the future and understanding the past, Routledge A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting Beyond one-step-ahead forecasting: Evaluation of alternative multi-step-ahead forecasting models for crude oil prices Particle swarm optimization of partitions and fuzzy order for fuzzy time series forecasting of COVID-19 A new hybrid model for wind speed forecasting combining long short-term memory neural network, decomposition methods and grey wolf optimizer A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series Intelligent optic disc segmentation using improved particle swarm optimization and evolving ensemble models ELM-based adaptive neuro swarm intelligence techniques for predicting the California bearing ratio of soils in soaked conditions A chaos recurrent ANFIS optimized by PSO to predict ground vibration generated in rock blasting A hybrid computing model to predict rock strength index properties using support vector regression Digital currency forecasting with chaotic metaheuristic bio-inspired signal processing techniques Feature subset selection via an improved discretizationbased particle swarm optimization Comprehensive learning particle swarm optimizer for global optimization of multimodal functions Multi-step-ahead neural networks for flood forecasting Multi-period forecasts of volatility: Direct, iterated, and mixed-data approaches Prediction by exponentially weighted moving averages and related methods Multi-output nonparametric regression A tutorial on support vector regression Financial time series forecasting using support vector machines Application of support vector machines in financial time series forecasting Artés-Rodríguez, Multi-dimensional function approximation and regression estimation Camps-Valls, Multioutput support vector regression for remote sensing biophysical parameter estimation Multivariate output global sensitivity analysis using multi-output support vector regression Multi-target support vector regression via correlation regressor chains Feature selection for support vector machines by means of genetic algorithm Forecasting: principles and practice, OTexts Multistep prediction in autoregressive processes Long term time series prediction with multi-input multi-output local learning