key: cord-0920454-3q0icf93
authors: Liu, Xin; Li, Shunlong
title: Impact of COVID-19 pandemic on low-carbon shared traffic scheduling under machine learning model
date: 2021-06-25
journal: Int J Syst Assur Eng Manag
DOI: 10.1007/s13198-021-01176-x
sha: fcd9fe67a4c3d581324d7ae293ed75ea3df21e98
doc_id: 920454
cord_uid: 3q0icf93

The present work aims to expand the application of machine learning models in predicting and identifying traffic flow data and provide a reference for the scheduling and management of shared traffic against the Coronavirus Disease 2019 (COVID-19) pandemic. First, a time segmentation-based prediction model is proposed considering the classification superiority of Support Vector Machine (SVM) and combining the Optimal Segmentation Algorithm (OSA), denoted as OSA-SVM. Second, an algorithm for generating a shared traffic flow sequence is proposed based on the historical data of shared traffic flow. Finally, a shared traffic flow moment identification model is constructed based on the label propagation algorithm and the Random Forest (RF) model. Comparative analysis suggests that the OSA-SVM regression prediction model can accurately fit the fluctuations caused by the shared traffic flow data; however, its overall effect is not good, with deviation from the actual traffic sequence. Introducing historical data for weighting processing improves the goodness-of-fit of the regression prediction model significantly, maintaining at the level of 0.66–0.71 after one week. The stochastic gradient descent algorithm can provide a better weighted processing effect. The RF model shows the best recognition effect for the shared traffic data stream compared with other models, presenting an excellent performance in dealing with the imbalance and instability problems. The proposed model and algorithm have outstanding prediction and recognition accuracy in shared traffic scheduling, playing an active role in traffic control during COVID-19 prevention and control.

Started in December 2019, the Corona Virus Disease 2019 (COVID-19) spread rapidly across the world, bringing earth-shaking changes to human society. Nowadays, shared traffic is quite important in low-carbon lives and traveling, which is the basis for low-carbon urban development and the guarantee of happy life ; Palmer et al. (2018) ). However, the outbreak of the COVID-19 pandemic has affected shared traffic significantly. Against such a background, traffic problems start to be exposed, such as traffic jams, traffic accidents, air pollution, and haze. In this regard, predicting the data of shared traffic and recognizing the key information are vital for shared traffic scheduling. If the traffic information can be pre-learned before being controlled and scheduled in real-time, both carbon emissions and traffic accidents will be greatly reduced. Moreover, the handling and resolution of unexpected traffic conditions can also be improved (Shao et al. (2017) ).

On this basis, Support Vector Machine (SVM) and the Random Forest (RF) model are introduced to explore the prediction and scheduling of shared traffic flow data in the context of the COVID-19 pandemic, in an effort to provide Experts and scholars worldwide have researched traffic scheduling and traffic flow prediction. Alves and Cordeiro (2021) predicted the measured values of highway traffic flow. They proposed a new adaptive algorithm to monitor the interconnected highways in the complex network independently. The traffic flow could be accurately predicted using local traffic only. The accuracy of predicting a highway's traffic flow within 15 min could reach 95.5%. Regarding the traffic prediction problem, Pavlyuk (2021) proposed a space-time cross-validation approach. Analysis found that this approach could estimate the model's generalization ability in the space-time dimension, namely, the ability to predict traffic flow data of the unknown surrounding road sections. This approach also provided the possibility for achieving model stability. Alkheder and Alrukaibi (2020) adjusted the duration of traffic lights and constructed an adaptive traffic system based on fuzzy logic codes on MATLAB. The system could reduce the time delay caused by traffic jams. The reduction rate of the intersection facing the Engineering Society was 63.82%, and that facing the British Embassy was 11.82%. Pamula (2019) analyzed the methods to evaluate traffic conditions in intelligent transportation systems, focusing on the role of deep learning networks based on Multi-Layer Perceptions (MLPs) and autoencoders, as well as their applications in traffic prediction, which provided a practical reference for traffic data analysis.

In China, scholars have also discussed the problems of traffic scheduling and traffic flow prediction. Zhang and Huang (2018) proposed a traffic flow prediction model based on Deep Belief Networks (DBNs) and fine-tuned it using the conjugate gradient algorithm. This model could provide excellent performance in different periods. Zhao et al. (2019) proposed a learning algorithm for the parallel computation of DBN's learning process. Then, they applied this algorithm to predict the traffic flow of actual traffic data. Tests proved that this algorithm could maintain feature learning. Fu et al. (2017) used the Monte Carlo method to simulate random traffic flow. Establishing a traffic flow growth prediction model, they conducted a statistical analysis of the measured traffic flow of Nanjing Third Yangtze River Bridge from 2006 to 2010. Mao et al. (2018) employed the Convolutional Neural Network (CNN) as a deep learning architecture to select the best path and packet forwarding in the switch through the controller. This architecture presented great application potential in tracking and collecting traffic data flow. Cheng et al. (2019) explored the high-precision positioning of high-speed trains. They established and optimized a prediction model combining the K-Means algorithm with the Least Squares SVM (LSSVM). Results confirmed this model's good adaptability and real-time performance, which could play a significant role in positioning high-speed rail actual data. Wu et al. (2021) proposed an airport passenger flow prediction model based on a two-stage learning framework. The first stage provided the function of processing time sequence features, and the second stage could integrate the prediction results. Analysis found that the model performed better and more stably than fusion models.

To sum up, deep learning approaches and machine learning techniques have been applied to predict traffic flow. However, works that simultaneously employ SVM and RF are rarely reported, as well as the application of these two models against the background of the COVID-19 pandemic. 

). While solving the linear regression problem, the following training samples are given:

where n represents the total number of samples, x i represents the vector of m-dimension, and y i 2 R. Then, Eq. (1) is solved by the SVM regression algorithm. To obtain a better fitting effect, the regression function can be expressed as:

where w represents the vector of m dimension, b 2 R, and w; x h i represents the inner product corresponding to w and x.

Solving the regression function is essentially the process of determining the parameters w and b. Based on the application of SVM in the classification problem, the solution of the regression function can be transformed into that of the convex optimization problem. The equation is: min 1 2 w k k 2 subject:::to

where e represents the accuracy, y i corresponds to the actual target of the sample point, min represents the minimization of w, and subject:::to represents the constraint requirements corresponding to the convex optimization problem.

The meaning of other symbols or parameters is the same as Eq. (2).

On this basis, to avoid overfitting, slack variables n i ; n Ã i and penalty factor C are introduced. Then, the solution to the optimization problem is transformed into:

n i n Ã i subject:::to

Generally, the SVM regression algorithm is solved by constructing the Lagrange function (Lal and Datta (2018) ). In this way, the regression function of the SVM regression algorithm can be expressed as:

where a i and a Ã i represent the acceptable parameters of the penalty factor.

By using the form of the constructor, the SVM regression algorithm can be more convenient for processing regression tasks. While the SVM model can process the regression task, the parameters, including the penalty factor and the linear changes in the sample, should be modulated. The above calculation of regression function in SVM is only applicable to linear regression problems. Other technologies are needed to solve the nonlinear regression problems such as shared traffic condition prediction. Therefore, to predict the shared traffic flow, a nonlinear SVM is combined with the Optimal Segmentation Algorithm (OSA) to propose an SVM regression prediction model that can segment time sequences.

OSA can classify ordered samples. Suppose a set of ordered samples:

where n represents the total number of samples, and X i represents the vector of m-dimension. Also, each class is defined as:

where G i;j represents the class of the samples.

Steps to implement OSA are (1) defining the inner diameter of the class, (2) defining the loss function, and (3) solving the optimal segmentation. The equation for solving the inner diameter of the class can be expressed as:

where X G represents the mean vector of the inner diameter of the class, and r represents the inner radius. Furthermore, the loss function can be expressed as:

Þrepresents the inner diameter of the t class, and L represents the loss function.

The equation to minimize the loss function is:

where L b n; k ð Þ ½

represents that under b n; k ð Þ, the loss function corresponding to dividing the n samples into k classes can be minimized.

The optimal segmentation problem can be solved according to the above equation. The first step is to find the segmentation point. The equation is:

According to Eq. (11), the classification result G k ¼ j k ; j k þ 1; . . .; n f gcorresponding to the k class can be deduced. According to Eq. (12), the classification result G kÀ1 ¼ j kÀ1 ; j kÀ1 þ 1; . . .; j k À 1 f gcorresponding to the k À 1 class can be deduced. The classification results can be obtained by analogy.

With reference to the above descriptions, Fig. 1 below illustrates the implementation of the time-segmented OSA-SVM regression model based on shared traffic flow prediction.

The actual traffic situation is affected by multiple factors.

To improve the performance of the OSA-SVM regression prediction model, some historical data of shared traffic flow are introduced. Based on the constructed OSA-SVM regression prediction model under time segmentation, a shared traffic flow historical data matrix is constructed to realize the shared traffic flow sequence generation algorithm. Specifically, this algorithm is implemented via initialization, prediction, and update.

(1) In the initialization stage, time segmentation and data preprocessing are performed on the original historical dataset. The shared traffic flow historical data are initialized. The sample queue corresponding to the top k shared traffic flow data in the sequence to be generated is the input of the regression prediction model. The results are weighted according to actual needs.

(2) In the prediction stage, the prediction reference value can be obtained by inputting the sample queue into the regression prediction model. The historical data value corresponding to the current prediction time is obtained from the historical data matrix. Finally, the predicted value can be obtained.

(3) In the update stage, the predicted value obtained above is placed in the corresponding sample queue. Then, the sample queue processing is updated according to the historical data matrix.

The historical data of shared traffic flow is weighted by prior knowledge before constructing the shared traffic flow sequence generation algorithm. The corresponding calculation is shown in Eq. (13) below.

where W i represents the weight vector, and m; i; j are parameters. While weighting the historical data of shared traffic flow, if the prior knowledge is insufficient, or an error occurs in the prior knowledge, and the actual driving section of shared traffic is different, the effect of weighted processing using prior knowledge will be not good enough. Therefore, the Stochastic Gradient Descent (SDG) algorithm is again proposed to weight the historical data of shared traffic flow. SDG algorithm is a machine learning approach, with broad applications in solving linear regression and optimization problems. It requires a loss function to complete the solution. The loss function to solve the regression problem is:

where L represents the loss function, Y is the dependent variable, X is the independent variable, y i is the value corresponding to the dependent variable, x i is the value corresponding to the independent variable, and f W represents the function value of the weight vector. By determining the learning rate, the SDG algorithm updates its parameters with the gradient direction, which can be expressed as:

where g represents the learning rate, and t is the parameter. To avoid overfitting, the SDG algorithm is implemented through regularization. The corresponding equation is: 

where k is the parameter.

Moments at which the traffic conditions change in the model should be analyzed and explored while processing the actual shared traffic flow. The label propagation algorithm can solve this problem (Lotfi Shahreza et al. (2017); Chang et al. (2017) . The result of spreading the class label to the whole sample can be obtained by applying the label propagation algorithm. The label propagation algorithm can construct the graph structure. All the samples are connected with each other. The equation of weighting between samples can be expressed as:

where w i;j represents the weight, and X i ; X j represent the samples. Then, the label transfer matrix is constructed. The element at each position in the matrix can be expressed as:

where P i;j represents the label transfer matrix, i represents the i-th row, and j represents the j-th column. Based on the label propagation algorithm, Fig. 2 below shows the process of marking the corresponding time point of the shared traffic flow warning.

The label recognition algorithm can specify the class label for the sample data; however, it is challenging to apply this algorithm to recognize the actual scene in realtime. Hence, the RF and the label recognition algorithms are organically combined considering the former's advantages in the unbalanced classification algorithm. Afterward, the combined algorithm is applied to recognize the early warning time point of the shared traffic flow in real-time. Figure 3 below displays the recognition structure based on RF and label recognition algorithm.

To recognize the early warning time points and predict the shared traffic flow, the KUL Belgium Traffic Sign Dataset is chosen (Li et al. 2019) . It includes 10 4 traffic sign annotations and thousands of traffic signs. Moreover, the cameras used are high resolution, including 4 video sequences and 1.6*10 4 background images. The BIT vehicle dataset is selected as well. This dataset contains images of nearly 10,000 vehicles, including buses, minivans, mini-buses, Sports Utility Vehicles (SUVs), cars, and trucks. These images are taken from different times and places; besides, the positions of the vehicles are annotated in advance. Hence, this dataset can evaluate an algorithm's vehicle detection performance.

The predicted values output by the model weighted by the prior knowledge, the model weighted by SDG, and the model without weighting are compared to analyze the accuracy of the proposed OSA-SVM regression prediction model. The effectiveness of the shared traffic flow sequence generation algorithm is tested by comparing the goodness-of-fit of the prediction model before and after weighting, as well as the goodness-of-fit after dynamically adding measured data. Finally, SVM, Decision Tree (DT), 

Before and after weighting, the proposed OSA-SVM model presents varying prediction accuracies, as shown in Fig. 4 below: In the figure, numbers 1-3 correspond to the original regression prediction model without weighting, the regression prediction model weighted by prior knowledge, and the regression prediction model weighted by SDG.

The proposed OSA-SVM model can accurately fit the fluctuations and changes in the actual shared traffic data flow. Meanwhile, the points of rapid changes in the shared traffic data flow can also be well captured. However, the overall prediction effect is slightly rough. The prediction accuracy tends to sequences with similar periodic changes with the prediction process, with deviation from the actual sequence. Weighted by the priori knowledge, the model can relatively accurately grasp the key inflection points despite some unstable fluctuation points for the one-week traffic flow sequence. The model weighted by SDG shows the same trend. Figure 5 illustrates the goodness-of-fit of the shared traffic flow data sequences generated by different schemes. The effect of using the initial model to generate a shared traffic flow data sequence without weighting is not ideal. After the model is weighted by prior knowledge and SDG, although its goodness-of-fit is reduced initially, the reduction degree is distinctly slowed down compared with the initial model. Specifically, the unweighted model's goodness-of-fit is reduced to around 0.6 at the beginning of generating shared traffic flow data sequence. Since then, it stays at a lower level. After weighting, the model's goodness-of-fit can maintain at a relatively high level. After a week, it can still be kept within the range of 0.66-0.71. After the measured data are added dynamically, the weighted prediction scheme can still provide excellent prediction accuracy. The prediction effect of the model weighted by SDG algorithm is more significant.

Different machine learning models' performance in recognizing early warning time points of the shared traffic flow is analyzed and compared, as shown in Fig. 6 below. Figure 6 points out the poor performance of naive Bayes. Although it presents a higher recall rate in the second type of samples, it shows poorer accuracy on other indicators. In contrast, RF models show the optimal performance.

The accuracy of RF models is analyzed and compared, and the results are shown in Fig. 7 below.

Under the undersampling technology, the RF model shows poor accuracy in recognizing the shared traffic flow data. Similarly, under the oversampling technology, the resulting training efficiency is also worse than that under the weight distribution scheme due to the increase in sample size. The RF models using other sampling technologies all show higher recognition accuracy. Hence, the RF model has an excellent performance in dealing with the imbalance and instability problems. 

The global outbreak of COVID-19 has impacted human society significantly. Shared traffic plays a vital role in constructing a green and low-carbon city; it also lays a solid foundation for the healthy and sustainable development of a city (Miramontes et al. (2017) ). Against the COVID-19 pandemic, the shared traffic scheduling becomes more important than in the past. To mitigate the adverse impact of shared traffic on cities during the pandemic, machine learning models are introduced to predict and recognize the shared traffic flow data. Results demonstrate that the OSA-SVM regression prediction model based on time segmentation can accurately fit the fluctuations and changes generated by the actual shared traffic data flow. It can also capture the data points with large fluctuations. Especially, the model can grasp the changes in traffic flow sequence and key points more accurately after weighting the shared traffic data, showing a better effect.

The RF model is applied to recognize the shared traffic flow time points. Even an unprocessed RF model can provide a high recognition accuracy for shared traffic data. However, different sampling premises will have different effects on RF's recognition accuracy. Undersampling leads to poor recognition accuracy, and oversampling also leads to reduced recognition accuracy. In contrast, once the label propagation algorithm is introduced, the RF model shows better accuracy in recognizing the shared traffic data flow.

In summary, both SVM and RF can play a positive role in predicting and recognizing shared traffic flow data after proper processing. Most importantly, they can do a good job even without any processing, providing an effective means to schedule the shared traffic during the COVID-19 pandemic.

Against the background of COVID-19, the role of machine learning models in predicting and recognizing shared traffic flow data is discussed in the present work. The following conclusions are drawn: (1) the OSA-SVM shared traffic flow regression prediction model can accurately predict the fluctuations of the actual data of shared traffic flow and precisely capture the changes in details; however, the overall effect is flawed. (2) Weighting processing by prior knowledge and SGD can improve the model's prediction accuracy significantly. The weighted model can basically grasp the changes in traffic flow data within some periods.

(3) Compared with other sampling approaches, the RF model shows the best performance in recognizing shared traffic flow data, with the highest recognition accuracy. Nevertheless, there are some weaknesses in the present study. The proposed model can provide some reference values for shared traffic flow; however, it cannot accurately capture the strong changes in traffic flow data. Also, criteria to evaluate the application of label propagation algorithms are lacking. Hence, more factors and evaluation indicators shall be incorporated in the following work.

Enhancing pedestrian safety, walkability and traffic flow with fuzzy logic

Effective and unburdensome forecast of highway traffic flow with adaptive computing

Feature selection and fault-severity classification-based machine health assessment methodology for point machine sliding-chair degradation

Refined spectral clustering via embedded label propagation

Intelligent positioning approach for high speed trains based on ant colony optimization and machine learning algorithms

Fatigue evaluation of cable-stayed bridge steel deck based on predicted traffic flow growth

Development and implementation of support vector machine regression surrogate models for predicting groundwater pumping-induced saltwater intrusion into coastal aquifers

Traffic sign recognition with a small convolutional neural network

Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning

A novel non-supervised deep-learning-based network traffic control method for software defined wireless networks

Impacts of a multimodal mobility service on travel behavior and preferences: user insights from munich's first mobility station

An implementation of type-2 fuzzy kernel based support vector machine algorithm for power quality events classification

Total cost of ownership and market share for hybrid and electric vehicles in the UK

Impact of data loss for prediction of traffic flow on an urban road using neural networks

Spatiotemporal cross-validation of urban traffic forecasting models

Optimization of a traffic control scheme for a post-disaster urban road network

A picture is worth a thousand words: share your real-time view on the road

Brillouin optical time-domain analyzer assisted by support vector machine for ultrafast temperature extraction

Forecasting air passenger traffic flow based on the two-phase learning model

traffic flow prediction model based on deep belief network and genetic algorithm

Parallel computing method of deep belief networks and its application to traffic flow prediction

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Acknowledgments The authors acknowledge the help from the university teachers.Author Contributions All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.Funding This research received no external funding.

Conflict of interest All authors declare that they have no conflict of interest.Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.Informed consent Informed consent was obtained from all individual participants included in the study.