key: cord-0044798-m0xa44mk authors: Angarita-Zapata, Juan S.; Masegosa, Antonio D.; Triguero, Isaac title: General-Purpose Automated Machine Learning for Transportation: A Case Study of Auto-sklearn for Traffic Forecasting date: 2020-05-15 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50143-3_57 sha: 7ec1138156beafba4f562355b9b45c2c60529dc2 doc_id: 44798 cord_uid: m0xa44mk Currently, there are no guidelines to determine what are the most suitable machine learning pipelines (i.e. the workflow from data preprocessing to model selection and validation) to approach Traffic Forecasting (TF) problems. Although automated machine learning (AutoML) has proved to be successful dealing with the model selection problem in other applications areas, only a few papers have explored the performance of general-purpose AutoML methods, purely based on optimisation, when tackling TF. In this paper, we provide a thorough exploration of the benefits of Auto-sklearn for TF, as a general-purpose AutoML method that follows a hybrid search strategy combining optimisation with meta-learning and ensemble learning. Particularly, we focus on how well Auto-sklearn is able to recommend competitive machine learning pipelines to forecast traffic, modelled as a TF multi-class imbalanced classification problem, along different time horizons at two spatial scales (point and road segment) and two environments (freeway and urban). Concretely, we test the following scenarios: I) a hybrid search strategy with the three components (optimisation, meta-learning, ensemble learning), II) a strategy based on meta-learning and ensemble learning, and III) a strategy based on the estimation of the best performing pipeline from those suggested by the meta-learning. Experimental results show that the meta-learning component of Auto-sklearn does not work properly on TF problems, and on the other hand, that the optimisation does not contribute too much to the final performance of predictions. A well-established strategy to tackle congestion is the design, development and implementation of TF systems. TF can be defined as the prediction of nearfuture traffic conditions (e.g. travel time) [16] . The recent emergence of telecommunications technologies integrated into transportation infrastructure generates vast volumes of traffic data. This unprecedented data availability and growing computational capacities have incremented the use of Machine Learning (ML) to address TF. From a ML perspective, TF is focused on building a predictive model using historical data to make predictions of traffic measures based on new and unseen data. In spite of the aforementioned progress, different ML algorithms and preprocessing approaches may be more appropriate for different kinds of traffic data. Determining the best pipeline (sequence of data preprocessing techniques and a learning algorithm) for making traffic predictions is not a trivial task. In the ML area, this challenge is known as the Model Selection Problem (MSP) and Automated Machine Learning (AutoML) has been one of the most successful approaches addressing it so far. AutoML aims at automatically finding the best combination of preprocessing techniques, ML algorithm and hyperparameters that maximise a performance measure on given data without being specialized in the problem domain where this data comes from. The search strategy to find the mentioned combination can be based either on a "pure" optimisation process that tests different promising combinations from a predefined base of preprocessing and learning algorithms [10] ; or it can be based on a hybrid search where the optimisation is complemented with learning strategies such as meta-learning [3] . In the latter case, the learning approach is in charge of systematically observing how different ML pipelines perform on a wide range of tasks to take advantage of this experience to learn new tasks faster [14] . Roughly speaking, it can be seen as using ML for designing ML algorithms. AutoML methods have successfully approached the MSP in other areas [8, 18] , however, it has hardly been explored in TF [1] . In the latter area, the current progress is focused only on AutoML methods designed purely on optimisation approaches; thus, leaving aside the study of AutoML methods that have hybrid search strategies. Having this idea in mind, the contribution of this paper is to study the benefits in terms of performance and computational cost of hybrid AutoML for TF. We use Auto-sklearn [3] , a state-of-the-art hybrid AutoML method whose search strategy of pipelines uses bayesian optimisation, meta-learning and ensemble learning. To accomplish this objective, we use as a benchmark a multi-class imbalanced classification problem for different time horizons and for freeway and urban environments. Under these traffic forecasting settings, we explore the performance of the Auto-sklearn's components through three scenarios: I) a hybrid search strategy that uses its three components (optimisation, meta-learning, ensemble learning), II) a meta-learning strategy combined with ensemble learning, and III) a strategy based on the estimation of the best performing pipeline from those suggested by the meta-learning. The rest of this paper is structured as follows. Sections 2 and 3 present background and related work about AutoML methods in TF. Section 4 exposes the methodology followed in this paper. Then, Sect. 5 analyzes the main results obtained. Finally, conclusions are discussed in Sect. 6. This section reviews literature related to AutoML in the context of TF. We start presenting the foundations of general-purpose AutoML methods and finally, Sect. 2.2 reviews Auto-sklearn, the state-of-the-art hybrid AutoML method used in this research. According to [18] , a ML pipeline P can be defined as a combination of algorithms A that transforms input data X into target values Y . Let A be defined as wherein A preprocessing is a subset of preprocessing techniques, A f eature a subset of feature engineering methods, and A algorithm a ML algorithm with configuration of hyperparameters λ i ∈ Λ. In order to build a ML pipeline with this structure, human effort and high computational capacities are needed because there is no pipeline that can achieve good performance on every learning problem [6, 17] . This usually is done by means of a trial and error approach in an iterative manner, which causes that the success of ML comes at a great price [17] . AutoML is an emerging sub-area in ML that seeks to automatise the ML workflow from data preprocessing to model validation [5] . It allows reducing human bias and improving computational costs by making the construction of ML applications more efficient. The process consists of identifying the most promising combination P A i ,λ i that satisfies a given performance metric or condition when P A i ,λ i is trained on training data D (i) train and evaluated on test data D Current literature [5, 18] reports a variety of general-purpose AutoML methods. According to Chen et al. [17] , there are two types of taxonomies that can categorise these methods. First, a "what" taxonomy that determines which stages of a ML pipeline are going to be automated (e.g., data preprocessing and algorithm selection, algorithm selection and hyperparameters, or even the entire pipeline). Within this taxonomy, the most common case is the CASH [13] (Combined Algorithm Selection and Hyperparameter) problem wherein AutoML is focused on finding the best combination of ML algorithm and its hyperparameters setting, leaving the data preprocessing up to the human user. In this paper, we are focused on the automation of fixed-size ML pipelines composed of data preprocessing techniques and a classifier algorithm with their respective hyperparameters configurations. In contrast, the second taxonomy proposed by [17] classifies how the automation process to find the most promising pipeline is done. On the one hand, some AutoML methods use only an optimisation strategy wherein the ML pipeline is built testing multiple possible combinations from a predefined search space of preprocessing methods, ML algorithms and hyperparameters configurations. From this perspective, the ML pipeline building problem consists of finding a pipeline structure P A,λ that minimises a cross-validation loss function As shown in Eq. 2, this search process can be considered as a black-box optimisation problem that is not easily solvable as the search space can be large and complex. This equation is usually non-smooth and derivative-free, and convergence speed is a critical problem for building ML pipelines. Some methods to solve this equation are grid search, random search, bayesian optimisation and sequential Model-Based Optimisation. On the other hand, regarding the second taxonomy proposed by [17] , there are other AutoML methods that use the aforementioned optimisation in combination with learning strategies to constitute a hybrid search strategy with the purpose of reducing computational costs. In this case, the focus is on applying a ML algorithm at the meta-level to learn meta-knowledge that guides the AutoML process; this approach is known as meta-learning [7, 14] . Meta-learning is the data-driven task of systematically observing how different learning algorithms or pipelines perform on different learning tasks and then learning from this experience to warm-start the optimisation process in a new and unknown ML task. This warm-start consists of promising pipelines that are used by the optimisation as starting points to be evaluated in the first place before trying pipelines extracted from a predefined search space. Meta-learning can extract meta-knowledge using three different strategies [14] . First, there is the "Learning from prior evaluations" strategy wherein a set of known-previous learning tasks t j ∈ T , a set of configurations λ i ∈ Λ (e.g., hyperparameter settings), and a set of all prior evaluations P i,j coming from applying the configurations Λ over the tasks T , have to be given. Having this knowledge, the objective is to train a meta-learner L able to recommend promising hyperparameters configurations λ i for an unseen and new task t new . In contrast, the second approach is known as "Learning from task properties". It is based on characterising the known-previous learning tasks t j ∈ T using metafeatures m j ∈ M (e.g, number of instances and features, class imbalance), then extracting the configurations λ i ∈ Λ of the learners associated with these prior tasks, and finally collecting the performance P i,j of the trained models given the meta-features m j and the configurations λ i . Having this meta-knowledge, the objective is to train a meta-leaner L that predicts the performance of pipelines or recommend them for an unseen and new task task new . Lastly, the third approach is "Learning from prior models". In this case, the focus is on training a meta-learner given the parameters m j ∈ M (e.g. model parameters, features) of prior learnt models, their configurations λ i , and the performance P i,j of these learners over the previous and known tasks. Then, the objective is to train a meta-learner L that transfers trained models to save computational costs at the moment of approaching a new task t new . Within this paper, we focus on Auto-sklearn, the AutoML method with a hybrid search strategy that includes optimisation and meta-learning based on the approach "Learning from task properties". This method is presented with more details in the following section. Auto-sklearn is an AutoML method that uses meta-learning, bayesian optimisation and ensemble selection to find promising ML pipelines composed of preprocessing methods and ML classifiers. Here we provide a brief description of the method. The interested reader is referred to [3] for further details. In an off-line phase, for a repository of 121 data-sets, bayesian optimisation is used to determine an optimised ML pipeline with high performance on every data-set. These pipelines are generated from a search space of 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods. Then, for each data-set, a set of 38 meta-features is extracted to characterise every set of data; these meta-features include simple, information-theoretic and statistical information such as statistics about the number of data points, features, the number of classes, data skewness, the entropy of the targets, among others. Later on, instead of storing the 121 data-sets, their meta-features and the ML pipelines are saved in a meta-knowledge base wherein each instance contains the set of meta-features describing every data-set and the optimised pipeline that works well on it. In the online phase, that is, when a new data-set D new is given, Auto-sklearn computes its meta-features, ranks all the data-sets stored in the meta-knowledge base (stored in the form of meta-features and not the data itself) by their L1 distance w.r.t. D new , and selects the stored ML pipelines for the k nearest data-sets (by default k = 25). The assumption is that these selected pipelines are likely to perform quite well in D new as they performed well on data-sets with similar meta-features (pipelines closer to the first position of the ranking would expect higher performance on D new ). This selection of K most promising pipelines is used then to seed the bayesian optimisation component as a warm-start approach, which boosts the performance of the optimisation. In addition to the recommendations done by the meta-learning component, the bayesian optimisation process (under a time budget constraint) generates and tests new pipeline structures from the same aforementioned search space. In the final step of Autosklearn's workflow, the best pipelines identified during the bayesian search process are used to construct an ensemble. This automated ensemble construction avoids to commit itself to a single hyper-parameter setting, and it is more robust than only using the best pipeline found with the optimisation component. Within the most representative AutoML methods are Auto-WEKA [13] , Autosklearn [3] , TPOT [10] , ATM [12] , and ML-Plan [9] . In the case of the first two methods, they are focused on the construction of fixed ML pipelines in which the pipeline structure is a linear sequence of data preprocessing and algorithm learning. The other methods work by building pipeline structures that can be more complex and diverse. In the case of Auto-WEKA, TPOT, ATM and ML-Plan, they use an optimisation approach to find pipeline structures; meanwhile, Auto-sklearn is the state-of-the-art method to generate ML pipelines using a hybrid search strategy. As a common denominator, all these AutoML methods are agnostic w.r.t. the problem domains in which they have been applied; in this sense, they are general-purpose methods that have shown competitive performance in different applications areas [5] . In the transportation area, to the best authors' knowledge, only three papers have used AutoML methods for TF [1, 2, 15] . The first research carried out by Vlahogianni et al. [15] proposed a meta-modelling technique that, based on surrogate modelling and a genetic algorithm with an island model, optimises both the algorithm selection and the hyper-parameter setting. The AutoML task is performed from an algorithms base of three ML methods (Neural Network, Support Vector Machine and Radial Base Function) that forecast average speed in a time horizon of 5 min, using a regression approach. After that, Angarita et al. in [1] and [2] used Auto-WEKA, an AutoML method that applies sequential model-based bayesian optimisation [4] to find optimal ML pipelines. Both papers compared the performance of Auto-WEKA w.r.t. the general approach, which consists of selecting by trial and error the best of a set of algorithms to predict traffic. In the case of [2] , the paper was centred in forecasting traffic LoS at a fixed freeway location through multiple time horizons. On the other hand, in [1] , the authors were focused on predicting traffic speed on a subset of families of TF regression problems focused on making predictions at the point and the road segment levels within the freeway and urban environments. The main differences between this research and the three aforementioned papers lay on the typology of AutoML method used and the addressed TF problems. Whereas the previous three focused on "pure" AutoML optimization approaches, in this research, we centre on a hybrid strategy based on meta-Learning, optimization and ensemble learning with the purpose of evaluating the benefits that the former has on TF within three scenarios: by its own (without optimization and ensemble learning), in combination solely with ensemble learning, and integrated to optimization plus ensemble learning. This research seeks to keep exploring the benefits that meta-learning within AutoML of hybrid search strategies can bring to TF. To accomplish such purpose, we analyse to what extent Auto-sklearn, the state-of-the-art AutoML method for this category of search strategies, is able to recommend competitive ML pipelines for TF. In this context, the following parts of these sections are devoted to giving more details about the data-sets used for the experimentation (Sect. 4.1); and the experimental set-up of this study (Sect. 4.2). For experimentation, we considered two TF environments: freeway and urban. For the freeway environment, the data was collected from the Caltrans Performance Measurement System 1 whereas for the urban one, the data was collected from the Madrid Open Data Portal 2 . In both cases, the traffic measure used was thee months of speed in aggregation times of 5 and 15 min, respectively. For more details about the raw data used to generate the data-sets employed in this research, the interested reader is referred to [1] . Concretely, we approach two types of TF classification problems with two problem instances for each of them. In both problems, the objective is to predict a categorical measure named LoS as a multi-class classification problem based on continuous traffic speed. LoS is used to categorise the quality levels of traffic through letters from A to E in a gradual way 3 [11] . The first TF problem corresponds to the prediction of LoS at a target location in a freeway environment. The first instance of this problem is based only on past traffic speed data of the target location (temporal data, T); meanwhile, the second instance considers historical traffic data coming from the target location and from four downstream positions (temporal and spatial, TS). It is important to clarify that these two instances of the first TF problem are correlated because they share the same target location. The second kind of TF problem is focused on forecasting LoS within an urban context independent of the freeway data described above. Repeatedly, the two correlated instances of this problem are: predict LoS for a single target location considering exclusively historical data of this spot, and forecasting LoS taking into account past traffic speed of the target location together with other four downstream positions. For the two TF problems described above, we generated 36 data-sets (20 for freeway data and 16 for urban data). In the freeway case, the time horizons wherein LoS is predicted are 5, 15, 30, 45, and 60 min using data granularity of 5 min (granularity means how often and how long the traffic measure is aggregated). Unlike the previous one, for the urban TF problem, the forecasting time steps are 15, 30, 45, and 60 min with data granularity of 15 min. To better identify the data-sets, they are named following the next structure: Context InputData TimeHorizon. Attributes of the freeway and urban data-sets where the input is composed of only temporal traffic data from the target location and calendar data are: 1) Day of the week; Minute of the day, 2) Traffic speed of the objective spot at past 5, 10, 15, 20, 25, 30, 35, 40, 45 min for freeway and 15, 30, 45, 60, 75, 90, 105, 120, 135 min for urban, and 3) LoS in the target location. In the case of the freeway and urban data-sets where the input consists of historical speed taken from the target location and from four downstream detectors, the attributes are the same mentioned above for the target location and include also attributes of traffic speed of the four downstream locations at the same past times. Table 1 presents a summary of the 36 data-sets that includes the number of instances, the number of attributes, the number of instances per class and the Imbalance Ratios (IRs) of each data-set. The IR is calculated by dividing the number of instances of the majority class over the instances of each of all the other classes. IR values show that the generated data-sets have a different imbalance degree. Some data-sets do not contain all the possible classes because on some occasions some of the classes had an extremely low presence (e.g. 20 samples) which introduced noise in the results. Samples of these classes where tagged as classes of the closest label with the lowest number of samples. Moreover, the differences between freeway and urban data-sets of Type I and Type II are their class distributions. Within each type, the class distribution is the same for all time horizons. In this sense, we can explore the capacity of Auto-sklearn when approaching different degrees of imbalanced data-sets. Considering the traffic forecasting setting presented, we explore the performance of the Auto-sklearn's components through three scenarios using the data-sets presented above. First, a default scenario in which the hybrid search strategy of the AutoML method uses its three components to find pipelines. In this case, we considered three execution times for Auto-sklearn (ET): 15, 60 and 120 min. They correspond to the time that the bayesian optimisation can take to find the best pipelines and their hyper-parameter configuration for a given data-set. For assessing the performance of this scenario, the data-sets are partitioned in training (80%) and test (20%), keeping the chronological order of the data. In the second scenario, we probe an alternative approach in which the recommendations done via meta-learning are combined in two ensembles based on weighted-voting without using the optimisation process. First, we extract the 25 best ML pipelines (default value used by Auto-sklearn) suggested by the metalearning component, which then are combined in the weighted-voting ensemble named MetaEns25. To test this ensemble, the data-sets are partitioned in the same way as Auto-sklearn, such as was described above. For the second ensemble, we extract the complete list of 121 ML pipelines that can be suggested by the meta-learning component and choose again 25 best pipelines, based on their validation error, to generate the ensemble MetaEn25-121. In this case, we do the following procedure: the data-sets are partitioned in training (60%), validation (20%) and test (20%). To select the 25 best pipelines, these are trained on the training set and their performance is assessed on the validation set. Then, the ensemble is built with the 25 pipelines with the best validation error. Finally, the ensemble is trained on training+validation partitions (same number of instances as previous strategies, that is, 80%) and validated on the test set. Lastly, for the third test scenario, we consider the meta-learning component in isolation. We follow a similar approach to that of MetaEn25-121, that is, we split the data-sets in training (60%), validation (20%) and test (20%). This means that for every data-set, we train the 121 pipelines suggested by metalearning on the training set and based on their error on the validation set, we choose the best pipeline. The latter is then trained on training+validation (that is, over an 80% of the instances) and assessed over the test set. To evaluate the experimental set-up presented, we use the metric G-measure (mGM) that is applied for multi-class imbalanced data in classification problems. where M is the total number of classes. Table 2 . This section presents the results obtained with the experimental set-up proposed in the previous section. Table 2 shows the mean mGM values obtained by the three execution times (ET) of Auto-sklearn (AutoS ET), the two voting ensembles (MetaEns25 and MetaEn25-121) and the best pipeline in validation from the meta-learning component (BestPipe Val). These mGM values were calculated by carrying out five repetitions for each approach on every data-set. mGM values in bold indicate the best result achieved in every data-set. Besides, the last column of Table 2 shows which is the winner approach in terms of performance on each data-set. In the cases wherein the best performing is obtained by the BestP ipe V al approach, we indicate the following information: the first pair between brackets indicates the ranking position of the winner pipeline according to the similarity metric used by Auto-sklearn, and the difference between the value of this metric and the one from the pipeline in the first position of the ranking, whereas in the second tuple between brackets, the value of the metric for the pipelines in positions 1, 25 and 121 (this information appears in the same order in the column named "Winner" of Table 2 ). In this way, we can observe whether there is a positive correlation between the ranking positions and the actual performance of the pipelines. The assumption of Auto-sklearn is that pipelines closer to position 1 (distances near 0) are likely to perform better on the input data. From Table 2 , the following highlights can be extracted regarding the behaviour of the methods AutoML methods compared. The BestPipe Val component is by far the best performing approach when making traffic predictions. Concretely, it is able to suggest the best pipeline in 33 out of 36 data-sets, performing even better than the longer ET (120 min) of Auto-sklearn. However, these results also show that the distance measure in which Auto-sklearn is based it is not well correlated with performance as we explain below. If we check carefully the winner pipelines in the last column of Table 2 , only in 5 cases (data-sets: Fw TS+CD 5 -Type II, Ub T+CD 15 -Type I, Ub T+CD 15 -Type II, Ub T+CD 60 -Type II, Ub TS+CD 60 -Type II) the pipelines are located in a position higher than 25. As it was stated in Sect. 2.1, this is the default value that Auto-sklearn uses to recommend the 25 pipelines that are more likely to perform well on the input data. Such recommendation is made by the similarity metric that compares the meta-features of the input data-sets against the meta-features stored in the meta-knowledge base. Considering such comparison, the similarity metric chooses the best pipelines found in the off-line Auto-sklearn's phase for each of the 25 most similar data-sets w.r.t. the one at hand. Based on results of Table 2 , the meta-features used for the comparison are not working properly and they are providing information to the similarity metric that makes it leaving out competitive pipelines located beyond position 25. In conclusion, the majority of pipelines in the column winner of Table 2 are associated to data-sets, that with the current Auto-sklearn's meta-features comparison, are no being categorised as similar w.r.t the TF data-sets. For the default scenario in which Auto-sklearn uses its three components to find competitive pipelines, longer ET are supposed to improve the final results of predictions. However, the improvements only rank from 0.01 to 0.07, approximately, in the best of the cases (e.g., Fw TS+CD 5 -Type I). This could be due to the fact that the meta-learning component is suggesting low-performance pipelines for the warm-start process of the optimisation component. Opposite to this tendency are data-sets F w T S + CD 15 -Type I, F w T + CD 15 -Type II and Ub T S + CD 15 -Type I wherein the best mGM value is found by an ET shorter than 120m ET. We observed that this worsening is due to the over-fitting produced by the hyperparameters tuning of Auto-sklearn on the recommended pipelines. This result indicates that it is necessary to introduce mechanisms in the hybrid search strategy of AutoML to deal with over-fitting, especially when execution times of the optimisation are higher. Regarding the performance of the two ensembles approaches based on weighted-voting (MetaEns25 and MetaEns25-121), the results of MetaEns25-121 are quite similar w.r.t. the results obtained when the optimisation component is taken into account. Concretely, in data-sets of freeway Types I-II and urban Type II, the MetaEns25-121 is able to outperform Auto-sklearn in multiple cases. In particular, in the data-sets F w T S + CD 45 -Type II, F w T S + CD 45 -Type I and F w T S +CD 5 -Type I the performance of MetaEns25-121 is better than any of the Auto-sklearn's ET. This can be explained because this latter ensemble is built using already optimised pipelines located beyond position 25 of the ranking, and as it was stated before, in those positions are competitive pipelines whose performance is boosted by the ensemble without the need of doing optimisation. For the case of MetaEns25, its performance is lower than MetaEns25-121 and the three ET of Auto-sklearn. However, it is interesting to note that these ensembles are not better than the best pipeline suggested via meta-learning; in this sense, it would be interesting to explore why the ensembles obtains a performance worse than the best pipeline in isolation. As the computational cost is a key factor in AutoML, Table 3 shows the execution times in minutes that the BestPipe Val and the two meta-ensembles took to make predictions on every data-set. As can be seen, in the majority of the cases, the three approaches spent less than 60 min, which is the second longer ET of Auto-sklearn. Finally, additional results that are observed regardless of what approach is the one with the highest mGM values are discussed below. In the cases of datasets freeway Types I-II and urban Type I, as the time horizon of predictions increases, the performance of all approaches decreases. For these data-sets, the ones that have a time horizon of five minutes are the TF problems in which the six approaches perform better. Besides, in data-sets F w T S + CD 30 and F w T S + CD 60, most of them have problems predicting the minorities classes and therefore their mGM values in these cases are equal to zero. Regarding urban data-set of Type I with only temporal traffic data (T), the three ET of Auto-sklearn and the two ensembles have the lowest performance. This is due to the fact that these data-sets have the highest IRs (IR(A/B) = 1.12 IR(A/C) = 12.04). This demonstrates that Auto-sklearn does not incorporate in its inner structure mechanisms to deal with high imbalanced classification data-sets. Meanwhile, in the case of urban data-sets of Type I with spatial and temporal data (TS) and all urban data-sets of Type II, the performance of the six approaches is quite acceptable and homogeneous across them. This behaviour can be argued as these 12 data-sets are the most balanced of the 36 data-sets (IR(B/A) = 2.40, IR(B/C) = 4.98; IR(A/B) = 1.02, IR(B/A) = 1.39). In this paper, we studied the benefits in terms of performance and computational cost of hybrid AutoML for TF. We use Auto-sklearn, a state-of-the-art hybrid AutoML method whose search strategy of pipelines uses bayesian optimisation, meta-learning and ensemble learning. We focused on how well Auto-sklearn is able to recommend competitive ML pipelines to forecast traffic, modelled as a multi-class imbalanced classification problem, along different time horizons in urban and freeway environments. From the results, we drew interesting conclusions. A simple approach based on estimating the best pipeline from Auto-sklearn's meta-learning component is able to suggest competitive pipelines that perform better than the results obtained by the three ET of Auto-sklearn considered and the two weightedvoting ensembles. However, these winner pipelines usually were not included in the 25 suggestions done by default by the Auto-sklearn's meta-learning component. Instead, they were located in lower positions, which could lead to thinking that the meta-features and the similarity metric in charge of recommending pipelines are not performing as expected for these data-sets. As a result, the ranking positions are not directly related to the performance that the pipelines could have on the TF data-sets. Another interesting conclusion is that the optimisation component is not adding too much to the final mGM values. Higher execution times for Autosklearn not always lead to better results as we can expect; this was also corroborated by previous research that approached the use of Auto-WEKA (another AutoML method) for TF [1, 2] . In spite of this, the performance of the optimisation process could be improved if the ranking recommended by the meta-learning component was re-organized using the validation error of these pipelines on the input data. Thus, the optimisation would only be fed by pipelines that already are corroborated for having high performance on the data-set at hand. However, caution needs to be taken to check the computational cost consumed when calculating the validation error of the 121 pipelines in the meta-knowledge base. Further research lines that we aim to explore in the future are: I) improving the synergy between meta-learning and ensemble learning; II) determining the TF problems in which the optimisation is strictly necessary to improve the results obtained via meta-learning. Evaluating automated machine learning on supervised regression traffic forecasting problems A preliminary study on automatic algorithm selection for short-term traffic forecasting Efficient and robust automated machine learning Sequential model-based optimization for general algorithm configuration Automated Machine Learning: Methods, Systems, Challenges Automated algorithm selection: survey and perspectives Metalearning: a survey of trends and technologies A review of automatic selection methods for machine learning algorithms and hyper-parameter values ML-Plan: automated machine learning via hierarchical planning Evaluation of a treebased pipeline optimization tool for automating data science Major High-way Performance Ratings and Bottleneck Inventory. Maryland State Highway Administration, the Baltimore Metropolitan Council and Maryland Transportation Authority ATM: a distributed, collaborative, scalable system for automated machine learning Auto-WEKA Meta-learning: a survey Optimization of traffic forecasting: intelligent surrogate modeling Engineering and Applied Sciences Optimization (OPT-i) -Professor Matthew G Short-term traffic forecasting: where we are and where we're going Taking human out of learning applications: a survey on automated machine learning Survey on automated machine learning Acknowledgments. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No. 815069 and the Marie Sklodoska-Curie grant agreement No. 665959.