key: cord-0194735-lscar8e4
authors: Chiabaut, Nicolas; Faitout, R'emi
title: Traffic congestion and travel time prediction based on historical congestion maps and identification of consensual days
date: 2020-11-10
journal: nan
DOI: nan
sha: 3d97ccc069d4f8d42a08fd2392f7e5a3e091d7a2
doc_id: 194735
cord_uid: lscar8e4

In this paper, a new practice-ready method for the real-time estimation of traffic conditions and travel times on highways is introduced. First, after a principal component analysis, observation days of a historical dataset are clustered. Two different methods are compared: a Gaussian Mixture Model and a k-means algorithm. The clustering results reveal that congestion maps of days of the same group have substantial similarity in their traffic conditions and dynamic. Such a map is a binary visualization of the congestion propagation on the freeway, giving more importance to the traffic dynamics. Second, a consensus day is identified in each cluster as the most representative day of the community according to the congestion maps. Third, this information obtained from the historical data is used to predict traffic congestion propagation and travel times. Thus, the first measurements of a new day are used to determine which consensual day is the closest to this new day. The past observations recorded for that consensual day are then used to predict future traffic conditions and travel times. This method is tested using ten months of data collected on a French freeway and shows very encouraging results.

Prediction of traffic states and travel times evolution is a key component of any traffic monitoring system and decision support system. Their accurate estimation is critical for freeway managers, especially when the network becomes congested. This problem has been extensively investigated in the transportation literature using model-based, simulation-based and data-driven approaches (Vlahogianni et al., 2005; Mori et al., 2015; Wang et al., 2018) . For short-time prediction, model-based and simulation-based approaches use traffic flow models in conjunction with data assimilation techniques such as recursive Bayesian estimators to predict the traffic states and the resulting travel times (Vlahogianni et al., 2014; Mori et al., 2015; Wang et al., 2006; Kumar et al., 2017) . Most datadriven approaches use general purpose parameterized mathematical model such as linear regression (Rice and Zwet, 2004) , Kalman filtering (Van Lint, 2008; Nanthawichit et al., 2003) , particle filters (Wang et al., 2007) support vector regression (Huang et al., 2014) , random forest, Bayesian network (Li et al., 2019) , artificial neural networks (Adeli, 2001; Van Lint, 2008; Vlahogianni et al., 2005; Xu et al., 2020; Li et al., 2017) and many other techniques to capture and learn from data the correlations between traffic variables (speed, travel-time) over space and time (Coifman, 2002; Polson and Sokolov, 2017; Ma et al., 2020) . As pointed out by Yildirimoglu and Geroliminis (2013) , who wrote a complete and useful state of the art of the estimation methods, these approaches suffer from various limitations. To quote only a few, a common limitation is the spatiotemporal correlations that are mainly artificially selected (Xu et al., 2020) . Beside, some of the existing methods resort to experienced travel times, i.e. travel times calculated by traveling a trajectory through the velocity field. However, this information is rarely available in real time because experienced travel time is usually greater than the prediction horizon.

Consequently, the purpose of this paper is to the evolution of congestion, therefore, travel times with a simple, fully explainable, and practice-ready method.

The proposed method uses both historical and real time traffic information to calculate short-term congestion and travel time evolution forecast. To this end, we used the concept of congestion map to consider queue propagation rather than traffic states variables evolution, such as density or speed. Figure 1 presents the mechanism of the global algorithm of the proposed method.

The fist step ( 1 ○) consists in partitioning historical information into k clusters C k presenting similar characteristics based on the traffic patterns observed in the highway. As in Yildirimoglu and Geroliminis (2013) , Nagendra and Khare (2003) and many other papers, we first resort to a Principal Component Analysis to reduce the number of variables. Then, a Gaussian Mixture Model and a k-means algorithm are used to gather the historical data. Note that both approaches are used to evaluate the sensitivity of the estimation method to the clustering process. Then, the proposed method differs from the one of Yildirimoglu and Geroliminis (2013): rather than considering the global behavior of the clusters, we try to identify which day d k within the cluster C k is the most representative of the group. This so-called consensual day is determined based on the congestion maps of the clustered days through a method derived from consensual learning technique (Filkov and Skiena, 2004 The remainder of the paper is organized as follows: Section 2 presents the case study and the dataset used in the paper, Section 3 introduces the prediction methods, Section 4 is devoted to the clustering process and Section 5 to the analysis of the results, while Section 6 includes a conclusion.

In this paper, we focus on the M 6 highway near Lyon, France. Figure 2 depicts a sketch of the site. It is important to notice that this highway is used to access Lyon's city center through a tunnel, which is a recurrent, active bottleneck. Moreover, this highway is one of the most important in France and favored by holidaymakers because it links Paris to the south of France (French Riviera) and the Alps. This highway is thus also called Motorway of the Sun.

Consequently, major traffic jams are always observed during holidays. Regularly staggered loop detectors can be found on this highway section.

These detectors provide average flow, speed and occupancy rate per lane every 1 minute. In this study, we mainly focus on data from 9 detectors of a 7 km long section, see Figure 2 . Accordingly, 9 sections of length ∆l roughly centered on the detectors have been defined. The maximal authorized speed varies from 90 km/h to 70 km/h within this section. In the remainder of the paper, detectors are labeled by increasing positions from 0 to 8. We consider that traffic conditions between detectors can be interpolated by using observations of the closest detector. All data from January 2018 to October 2018 is available (except May). We partitioned the data by day from 6:00 to 22:00 (960 observations per day). Finally, data has been roughly cleaned up to remove unrealistic values or problem of acquisition.

To mainly focus on traffic dynamic rather than speed evolution, we used the concept of congestion map. Indeed, recorded values of speed can vary because of many local phenomenons (such as variations in driving behaviors, noises in measurements, etc.) but without correlation with the macroscopic dynamics of flow that rule traffic conditions. One potential method, but far from perfect, to reduce this bias is to average the observations. Here, we decide to use a more drastic method to only focus on two possible traffic states: free-flow and congestion. For each loop detector l ∈ [0, 8], we consider therefore a variable x l that, at time t, is equal to 1 if the observed speed v l (t) is lower than a congested speed threshold v cong (fixed here at 40 km/h) and equal to 0 otherwise. It makes it possible to compute map M d of day d as Boolean matrix of size "number of detectors" x "number of observations per day" composed of elements x l (t). 

The regularity of traffic events makes very useful the information that we can obtain from historical data and what can been learned from past situations.

Consequently, it is worth appealing to classify the different observation days.

Because the size of the initial dataset is very large (8640 variables for a single day), the first step, to speed up the clustering methods and obtain accurate results, is to perform a Principal Component Analysis (PCA) to reduce the dimensions of the observations (Nagendra and Khare, 2003) . Notice that we consider here speed v l (t) for l ∈ [0, 8] and t ∈ [0, 960] as the main vector of the PCA. Since PCA is a usual method, we do not study here the results in details.

Then, the purpose of the second step of the method is to cluster similar days of the historical dataset. To this end, we use two classical clustering algorithms: (i) the k-means with an Euclidean distance between observations to gather and (ii) a Gaussian Mixture Model (GMM) as proposed in Yildirimoglu and Geroliminis (2013).

(i) The k-means algorithm is one of the most popular unsupervised learning methods, that aims to gather the data into groups of equal variance that minimize the inertia, i.e., within-cluster sum-of-squares. With a fast computing time, results are easy to interpret, but k-means implicitly assumes that all clusters are spherical. This shape could introduce a strong bias, especially for observations of highly non-linear phenomenons.

(ii) In the GMM method, the underlying idea is to consider that similar days constituting the different clusters follow normal distributions. Consequently, the set of clusters, i.e., the partition, is ruled by a GMM. This clustering method gives importance to the distribution of the data points and not only the distance between them. Consequently, GMM is well adapted to our case because it provides clusters that may have different sizes and correlation within them.

Note that a full covariance matrix is assumed in GMM.

Now, the idea is to determine which day of a cluster is the most representative of the group. As already explained, the motivation is to use the representative days as the prediction for the coming days, see Figure Here, we consider two observations as concomitant when, at a given detector l and time t, both observations for day d and p are in the same state (free-flow / free-flow or congested / congested), i.e. M d (i, j) = M p (i, j). The distance that we used can be formulated as:

where:

Then, we define the consensual day d k of a given cluster C k as the one that maximizes the sum of the Rand indices within a cluster:

Consequently, the set of consensual days D k can be determined for the whole historical dataset.

We are now going to take advantage of the consensual days to predict both congestion and travel times evolution in real time. Consider a new day of observation p. The time interval is discretized into periods of δt minutes. At time t, we try to determine which consensual day d k ∈ D k is the closest to this new day p. To this end, we only consider the last ∆t observations and build a partial congestion map

This map is compared to the partial maps extracted from the congestion

The consensual day d * p (t) that has the maximal Rand index, based on the partial maps, with the observation p is selected:

Now that d * p (t) have been identified, we used the historical data x l (t) and v l (t) of d * p to predict variables of day p for the next time step δt. This process is iterated at every δt, and congestion and speed maps can be built by gathering the data. Notice that the optimal consensual day d * p can (and surely will) change with time t. Duration δt (prediction horizon) and ∆t (learning period) belong to the parameters of the proposed method. Thus, the prediction of the congestion map and the travel times (respectively) of day p are (respectively):

and

:

is the element of the observed congestion map of the consensual day d * p (t) at time t for detector l, and v d * p (t) (l, t) is the observed speed at time t on the congestion day d * p (t) for detector l. The travel time τ at time t is calculated as:

where ∆x l is the length of section l, see Figure 2 . It is important to notice that it corresponds to the definition of the instantaneous travel times: the travel time at time t is calculated based on the speed at the different detector locations at time t (Yeon et al., 2008) . It may introduce some bias compared to the experience travel times, but, in our case, this bias is very limited because of the relatively short length of the case study (less than 10 km).

In order to evaluate the global prediction method proposed in the paper, a cross-validation procedure can be used. A simple holdout method is considered by randomly selecting 75% of the initial data as the learning set. This training set is clustered into K groups, and the associated K consensual days are determined. Then, travel times are predicted for the remaining 25% and compared with the observations to evaluate and validate the method. Details are presented in the following section. (2013) As already mentioned, our method share several common features with the work of Yildirimoglu and Geroliminis (2013) . Consequently, this section proposes a short comparison of the two approaches.

To summarize, the method of Yildirimoglu and Geroliminis (2013) aims to identify groups of days presenting similar traffic conditions to build stochastic congestion maps, identifying the probability that a section of the test case is congested. According to the newly available information, the stochastic congestion maps are used to revise the state of a priori knowledge. Their work is organized as follows:

• Historical data:

-Groups of similar days are identified with a PCAxGMM combination (speed is the observation vector);

-Stochastic congestion maps are produced for each group to highlight blocks of recurrent congestion;

• Real-time: the goal is to identify the closest situation in the historical data to determine which bottlenecks are active. Then, potential active bottlenecks are integrated in the processing of real-time measurements to produce experienced travel times.

-Identification of the active blocks based on the comparison of the real-time congestion map with stochastic congestion maps;

-Travel times are produced by combining the real-time measurements of the speed at the different detectors and a correction based on the identification of potential bottleneck.

In comparison, our method aims to identify groups of days presenting similar traffic conditions to determine a single representative day of each group.

The measurements of these representative days are then used in real-time to predict future traffic conditions. The newly available information is only used to determine the closest representative days and is directly processed to product prevision of the travel times. The method can be resumed as follows:

• Historical data:

-Groups of similar days are identified with a PCAxGMM / k-means combination (speed in the observation vector);

-Identification of the most representative day of each group by consensual learning and congestion maps;

• Real-time: the goal is to identify the closest consensual day and use past observations to predict future conditions. For both approaches, the number of groups K has to be fixed exogenously,

i.e., before performing the clustering of the dataset. Even if there is no definitive answer, several methods exist to determine the optimal number of clusters.

These methods are either based on a criterion minimization/maximization (such as the elbow or averaged Silhouette methods (Rousseeuw, 1987) ) or on a statistical test (such as gap statistic method, Bayesian information criterion or Akaike information criterion). However, these criteria are frequently not consistent between them, and it was the case for our study. Consequently, we have decided to tailor our metrics to determine the optimal number of clusters K that we need to predict the evolution of congestion and travel times.

The prerequisite is to determine clusters that gather a sufficient number of days (for example, 5% of the test dataset) with similar traffic conditions (intracluster homogeneity) and that are different enough from one to the other to justify different groups (inter-cluster dissimilarity). To this end, three metrics have been developed. The intra-cluster homogeneity is appraised by calculating the average of the Rand index inside a cluster:

where K is the number of clusters, C k is the cluster k, d k is the consensual day of cluster C k , p and q are days belonging to C k , and |C k | is the cardinal of cluster C k . The inter-cluster dissimilarity is evaluated by determining the average of the Rand index between each pairs of consensual days:

Finally, we also compared the number of clusters n min gathering more than 5 days to the targeted number of cluster K.

In order to perform this analysis, the clustering process is iterated 20 times to ensure the generality of the results. Then, the averages of the three metrics presented above are calculated. Figure 4a compares the intra-cluster homogeneity with inter-cluster dissimilarity in regards to various values of K for both approaches. It reveals that the inter-cluster dissimilarity is stable for K > 10. Simultaneously, the intra-cluster homogeneity continues to increase but a very low rate. In the meantime, Figure   4b depicts n min the number of clusters gathering more than 5 days in function of K. We also propose the evolution of the Silhouette score with the number of clusters (see Figure 4c ). It clearly shows that for K bigger than 30, both methods are unable to identify large clusters. It is even worse for high values of K. Consequently, it appears that n opt = 18 constitutes a good balance to gather the data of our test site (black cross in Figure 4a , b, and c). Especially, we can almost observe a local optimum for the average silhouette width at between 15 and 20 clusters. n opt = 18 ensures a sufficient number of clusters and a sufficient number of days per cluster (more than 13 days on average) with an acceptable intra-cluster homogeneity and inter-cluster dissimilarity. It also means that clusters can be modified or removed retrospectively without changing the metrics significantly. In addition to these averaged metrics, the stability of the clustering results is also studied. To this end, the Rand index between successive partitions is calculated. Consequently, if (C n−1 ) is the n − 1 partition and (C n ) is the n partition, the Rand index can be expressed as follows:

where:

• a is the number of pairs of days that are in the same clusters in (C n−1 ) and (C n );

• b is the number of pairs of days that are different clusters in (C n−1 ) and (C n );

• c is the number of pairs of days that are in the same clusters in (C n−1 ) and in different clusters in (C n );

• d is the number of pairs of days that are in different clusters in (C n−1 ) and in the same clusters (C n ). Figure 5 shows the evolution of the Rand index with the number of clusters for the two clustering methods. From 10 to 50 clusters, less than 5% of days change group when increasing the number of clusters. It appears that the results are reasonably stable and confirm that n opt = 18 is a satisfying trade-off between the different metrics. 

First of all, Figure 6a (b, respectively) shows the congestion maps of the consensual days D k that have been obtained for the k-means method (GMM, respectively). Interestingly, the patterns are quite different from one day to the other, for both methods. Note that d k have been sorted by decreasing size of C k , i.e. C i gathers more days than C i+1 . Notice that the consensual days of the k-means are denoted d i whereas those of GMM method are denoted d i .

Congestion maps of d 1 (6a) for k-means clustering and of d 1 for GMM method (6b) clearly correspond to the cluster of free-flow days. Situations with morning and evening peak hours can be identified (see, for example, d 3 and d 2 ) or with only a morning peak hour (see, for example, d 4 and d 15 ). It is also important to notice that the two methods do not lead exactly to the same consensual days (and the same clusters). In this case, they only have 7 consensual days in common.

(a) k-means method (b) GMM method Figure 7 shows the different congestion maps of the days gathered into cluster C 3 . Visually, the pattern is similar with always morning and evening peak hours. Interpreting the results of the clustering methods is always a difficult task because a relevant balance must be found between a trivial classification and useless and complex justifications of the hypothetical links between observations grouped into the same cluster. In our case, the number of attributes available to analyze and explain the configuration of the computed clusters is very limited.

Consequently, the post-clustering analysis is mainly focused on the weekday and the month of the observations that are gathered into the same group. Finally, cluster C 4 groups days with only evening congestion; it may explain the majority of Sundays in the set. Because analyzing the results cluster by cluster is a tedious task, a convenient approach is to use an alluvial diagram. Figure 

The global prediction method is now tested for our case study. We mainly focus on two metrics: the predicted congestion maps and the travel times series.

As already explained, the data has been divided into a training set (75% of the initial data) and a validation set (the remaining 25%). Moreover, we decided to only focus on the k-means method for the sake of brevity in the presentations of the results.

We first decide to compare our method with existing approaches to identify the optimal domain of application. This task is tough for many reasons. Notably, it necessitates selecting an existing method that is accurate enough, and which does not require tremendous work to be developed and calibrated. Moreover, existing methods have different objectives and domains of applications.

Consequently, we decided to compare our approach to a naive instantaneous method and a historical average method. Besides, the comparison with two other versions of our method is performed to convince about the predictive power of our approach.

Therefore, we define the following methods:

• M 0 is a naive approach consisting in shifting the observation made at time t to a horizon δt. Consequently, this method will be almost perfect for short δt;

• M 1 is a historical average method consists in calculating average values of the historical data for each day of the week and each time period.

• M 2 is the original method proposed in the paper based ( Figure 1 ) on the clustering of the historical data and the identification of consensual days;

• M 3 is the same as M 2 except that averaged congestion maps are calculated for each group instead of identifying a consensual day;

• M 4 follows the same process as M 2 except that historical data are not clustered. Each day of the historical dataset can be used as a prediction.

To compare these approaches, we use the following metrics:

• congestion prediction: Rand indexes, i.e. accuracy, and F1-score between predicted congestion maps and ground truth;

• Travel times prediction: RMSE between prevision and observation.

The test set is composed of the 56 days that have not been used to perform the clustering (167 days in the historical data set). horizons bigger than 30 minutes. In the same time, these methods are better than the historical approach M 1 for horizons smaller than 1 hour. Figure 11c depicts similar results for the MSE between predicted and observed travel times, except that that M 1 is almost the worst method between the five tested. The comparison shed light on the optimal domain of application of our method.

Consequently, a horizon of δt = 60 minutes is used in the remaining of the paper.

It is also important to notice that methods M 1, M 2, and M 3 have similar performance concerning the congestion propagation prediction. This is not surprising because they are three variants of the same methods. Besides, method M 2 and M 1 are slightly better than M 3 for the travel times. However, we prefer to retain method M 1 because the clustering makes it more understandable (prediction can be related to qualitative transportation scenario), and the use of a consensual day produces more realistic travel times when focusing on the time series of the predicted travel times.

The ability of the proposed method to anticipate congestion propagation is now appraised by comparing the predicted congestion maps with the real observations. Remember that the prediction method has three parameters: the duration ∆t that is used to make the prediction, the horizon δt for which the prediction is made, and the congested speed threshold (fixed here to 40 km/h). Figure 12 shows the observed maps, the predicted congestion maps (for ∆t = 15 min and δt = 60 min), and the difference between these maps for 6 randomly selected days of the testing sample. Note that the predicted congestion maps are the result from the repeated previsions at every time step (one minute in our case). The visual inspection reveals that the differences are very low between prediction and ground truth, and this for significantly different shapes of congestion propagation ( Figure 12c , the blue color is when a congestion propagation is predicted, but the observation is free-flow, red for the opposite). This qualitative analysis is very encouraging.

(a) (b) (c) (d) Figure 12 : Comparison of (a) observed and (b)predicted congestion maps (black is for freeflow -FF conditions, beige for congestion -C Conditions), (c) difference between them (blue is for C / FF and red for FF /C , white and gray for correct predictions) and (d) operational metrics of prediction accuracy (red is for +1, blue -1, black -2, beige +2 and gray for correct predictions)

To go further, we decided to calculate the accuracy of the prevision. The values are particularly good since they are very close to 100%, see titles of Figure 12 . This score might be imputed to the free-flow situations that are the most frequent and easy to predict in the congestion maps. We also calculate the F1-score of the prevision, that are all satisfying even if this metrics is more sensitive to the wrong prediction. However, it should be noted that the F1-score is calculated with the congested traffic conditions only. It may introduces a bias when comparing days with significant different volume of congestion. However, the forthcoming travel times evolution analysis will confirm that the results of the method are encouraging.

Keeping in mind that the method proposed here is tailored to be practiceready and answer to operational needs, specific metrics are developed to evaluate the prediction accuracy. The confidence that can be allowed by an operator to a decision support system, such as a traffic evolution predictor, is often based on simpler indicators than those used in a research paper. A potential criterion is the decision support system's ability to predict in the same direction what can be observed in real-time by the operator. For example, if an increase of the congestion is predicted for a horizon δt, will the operator really observe the congestion propagation in his.her monitoring system a time δt later?

Consequently, only predictions of variations are now evaluated. To this end, Figure 12d shows the colormap of ∆M d − ∆M * d for the 6 random selected day. It clearly reveals that only a few errors about the evolution direction are made by the method. Especially if we compare to the rough differences of the prediction and the observation (Figure 12d ), we do not accumulate the errors.

In addition to this qualitative analysis, a global indicator can be computed: the ratio ρ of accurate prediction over the total number of observations.

Distribution of ρ for the whole testing set is not shown because it is completely centered on the mean value (ρ = 98.2%) with a minimal standard deviation (0.4%). Therefore, the accuracy in the prediction of the evolution of the traffic conditions is excellent. It is very encouraging for the potential integration of the method into an operational decision support system. This will be the case for our test study.

As already mentioned, the proposed method can predict travel times by using the observed speeds of the consensual days. The main benefit of this approach is to produce realistic travel times because predictions come from past observations. Figure 13a shows the travel times series for the 6 previously studied days.

The orange curves correspond to the prediction, whereas the blue ones are the observations. Note that a horizon δt = 60 min and a learning duration of ∆t = 15 min have been used. Figure 13b shows the distributions of the errors between predictions and observations. The proportion of the predictions that have a precision below 2 minute stands from 40% to 79%. If we focus on the 3 minutes window, this value increases to 68% to 87%. Finally, Figure 13c highlights the variation of the optimal consensual day determined by the prevision method. It shows out that a large subset of the consensual days is used to predict travel times throughout the day.

Prediction of congestion propagation and travel times evolution is still a topic actively studied in the literature. This paper tries to make its contribution by proposing a simple method, which has the main benefit of being fully explainable compared to recent approaches based on machine learning or artificial intelligence methods.

The key component is the concept of congestion map, which is a binary observation metric of traffic states on a highway. Using this proxy reinforces the importance of focusing on traffic dynamics rather than on the numeric values of observation variables. Thus, the historical dataset of a French 10 km long freeway is classified into groups of days presenting similar traffic states. The second innovation of the proposed method is to identify a consensual day for each cluster. According to a distance based on the Rand index, the objective is to determine the day that is the most representative of the cluster. It makes it possible to find almost the expected traffic situations of this highway: morning and evening peak hours, only morning/evening peak hours, all day long congestion, free-flow day, holiday traffic, etc.

Once those consensual days have been determined, the method can be applied in real-time to predict congestion propagation and travel times evolution.

According to a real-time learning period, observations of a new day are compared to the consensual days' congestion maps. The closest one is identified, and the congestion map and speeds that have been observed for this specific day are used to predict the behavior of the new day for a given horizon. This very simple method gives encouraging results for both congestion and travel times evolution. Especially, the comparison with naive methods (instantaneous and historical average) reveals that the proposed model is useful for longer prediction horizons.

Various future directions can be pursued. The methodology can be improved by identifying the best duration to compare congestion maps, the accurate prediction horizon, and the congested speed threshold. To perform the study, we naively tested different values, but a sensitivity analysis could be conducted. A second improvement could be to use congestion maps that are no more dependent on the time of the day. The idea is to focus only on the shape of the maps,

i.e., shockwaves profiles. These claims still need to be researched and validated.

Variable Description 

Neural networks in civil engineering

Estimating travel times and vehicle trajectories on freeways using dual loop detectors. Transportation Research Part A: Policy and Practice

Integrating microarray data by consensus clustering

Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning

Bus travel time prediction using a time-space discretization approach

Graph convolutional recurrent neural network: Data-driven traffic forecasting

Building sparse models for traffic flow prediction: an empirical comparison between statistical heuristics and geometric heuristics for bayesian network approaches

Revealing the day-to-day regularity of urban congestion patterns with 3D speed maps

Hybrid machine learning algorithm and statistical time series model for network-wide traffic forecast

A review of travel time estimation and forecasting for Advanced Traveller Information Systems

Principal component analysis of urban traffic characteristics and meteorological data

Application of probevehicle data for real-time traffic-state estimation and short-term traveltime prediction on a freeway

Deep learning for short-term traffic flow prediction

A simple and effective method for predicting travel times on freeways

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

Online learning solutions for freeway travel time prediction

Optimized and metaoptimized neural networks for short-term traffic flow prediction: A genetic approach

Short-term traffic forecasting: Where we are and where we're going

A novel approach to estimate freeway traffic state: Parallel computing and improved kalman filter

RENAISSANCE -A unified macroscopic model-based approach to real-time freeway network traffic surveillance

Real-time freeway traffic state estimation based on extended kalman filter: A case study

Gegan: A novel deep learning framework for road traffic state estimation

Travel time estimation on a freeway using discrete time markov chains

Experienced travel time prediction for congested freeways

This research work was carried out as part of the "Lyon Covoiturage Experimentation -LCE" project at the Technological Research Institute SystemX, and was supported with public funding within the scope of the French Program "Investissements d'Avenir".The authors thank the different members of the IRT-Sytem X for the fruitful discussion about the data clustering and the Metropole de Lyon, which provide the data for this study. Plenty of other interesting datasets can be found on:https://data.grandlyon.com/ This paper was mainly written during the long night of the Covid-19 lockdown.