key: cord-0231172-imgkn5ea authors: Hua, Mingzhuang; Pereira, Francisco Camara; Jiang, Yu; Chen, Xuewu title: Transfer learning for cross-modal demand prediction of bike-share and public transit date: 2022-03-17 journal: nan DOI: nan sha: d50c51f1c134dad50c23d64314165a4a3a0e8bdf doc_id: 231172 cord_uid: imgkn5ea The urban transportation system is a combination of multiple transport modes, and the interdependencies across those modes exist. This means that the travel demand across different travel modes could be correlated as one mode may receive demand from or create demand for another mode, not to mention natural correlations between different demand time series due to general demand flow patterns across the network. It is expectable that cross-modal ripple effects become more prevalent, with Mobility as a Service. Therefore, by propagating demand data across modes, a better demand prediction could be obtained. To this end, this study explores various machine learning models and transfer learning strategies for cross-modal demand prediction. The trip data of bike-share, metro, and taxi are processed as the station-level passenger flows, and then the proposed prediction method is tested in the large-scale case studies of Nanjing and Chicago. The results suggest that prediction models with transfer learning perform better than unimodal prediction models. Furthermore, stacked Long Short-Term Memory model performs particularly well in cross-modal demand prediction. These results verify our combined method's forecasting improvement over existing benchmarks and demonstrate the good transferability for cross-modal demand prediction in multiple cities. Compared with only a few decades back, today's transportation system is much more largescale, heterogeneous, and dynamic. It is heterogeneous because a myriad of new services exists, such as car sharing, ride sharing, shared micro-mobility of bike-share and scooter-share, Mobility as a Service (MaaS) [1] . Autonomous shuttles already operate in several places, and even existing traditional modes, such as metro, bus, or taxi, have their modernized versions, often with smartphone apps, increased electrification, and autonomy. The options for the traveler are certainly more varied today than before. It is also dynamic because these new technologies allow for withinday (sometimes real-time) repositioning/control. Taxi and ride sharing companies often redistribute their fleets during the day, shared micro-mobility and car sharing companies often rebalance at least during the night. Pricing of services can vary by time and zone. Many cities around the world have operated these new mobility services that can provide users with convenient options, cost-saving benefits, and safe services [2] . As a typical service of micro-mobility, bike-share has been proven to help reduce traffic congestion [3] , improve health benefits [4] , and protect the environment [5] . These novel mobility services can also solve the first or last mile problems of public transit (metro or bus) [6] and play a key role in urban transportation. In such a dynamic system, the risk of supply-demand misalignment is intuitively greater than in a static system and having accurate demand predictions is vital for efficient responsiveness to demand. Most existing studies focus on separately predicting real-time demand, e.g., shared micromobility and public transit. Thus, even though users can interchange between various transport modes, operators have difficulty in providing collaborative operation of new mobility services and public transit. The key challenge of the collaboration is the lack of demand information exchanging of multiple transport modes [7] . For example, bike-share companies cannot know and then predict metro passenger flow, and metro companies also cannot know and then predict bike-share passenger flow. In the absence of demand information from other transport modes, it is virtually impossible to provide collaborative service and build the MaaS platform. Therefore, it is essential to address the large-scale problem of forecasting demand across different transport modes. This study dedicates to filling this research gap, particularly considering cross-modal predictions between bike-share, metro, and taxi. The concept of cross-modal predictions demands clarification: it refers to predictions of a certain mode (target) that use information from another mode (source). This includes situations where there is missing data for the target mode but rich data for the source mode (e.g., using metro data to predict bike-share demand soon after it is introduced in a city). It also includes situations of jointly predicting two or more modes given the aggregation of their datasets in a combined model. Therefore, cross-modal prediction is ultimately about data fusion across modes, taking advantage of inter-modal correlations to enhance predictability and data quality. Of interest is also the concept of Transfer Learning, which is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned [8] . Transfer learning can help to solve the above two problems because transfer learning methods can input forecasting knowledge into new mobility services such as MaaS from other traditional transport modes, which may have been in operation for many years. There is a research gap in transfer learning among multiple transport modes. In the existing studies of transport demand prediction, transfer learning has gained initial applications, enabling knowledge transfer across time and space. However, these studies [9] , [10] , [11] focus only on a single transport mode, missing the opportunity of knowledge transfer between different transport modes. In order to build a transfer learning model among different transport modes, we need to first determine the input-output frames. There are two input-output frames: single-station-input single-station-output (SISO), and multiple-station-input multiple-station-output (MIMO). The SISO frame is popular in demand prediction, and existing MIMO papers are very limited. However, the SISO framework has two insurmountable drawbacks. The first drawback is that SISO consumes too much computing time in large-scale cases. The actual operation should consider hundreds or thousands of stations (or other spatial objects) and predicting the station demand one by one would cost too much time. The second drawback is that SISO is not flexible enough to accommodate cases of different sizes. For example, Wang et al. [9] apply cross-city transfer learning in crowd flow prediction, and the method is match-first-predict-second in the SISO frame. They firstly matched the most similar grid among different cities and then transferred parameters from the similar grid to the target grid. But the SISO method requires that the input amount in different cases be the same, i.e., three cities in Wang et al.'s study all select 400 grids as spatial objects. So, the cross-modal prediction problem should use the MIMO frame. In this study, we establish transfer learning methods that can adapt to different input amounts and adopt the MIMO frame. There is a trend of collaborative operation for multimodal mobility services in urban transportation, so cross-modal demand prediction is very important in this trend. Besides, demand prediction studies face difficulty in forecasting new services and often suffer from the problem of missing data [12] , [13] , [14] . Therefore, this paper proposes a cross-modal forecasting framework that incorporates machine learning and transfer learning based on trip data of shared micromobility (bike-share) and public transit (metro and taxi). The main contributions of this paper are threefold: (a) Use the cross-modal method to get better prediction results; (b) Establish a transfer learning framework that can handle the data missing problem; (c) Address the difficulty of knowledge transferring in the MIMO models. This study, to the best of our knowledge, is the first paper in cross-modal transport prediction with the transfer learning approach. The rest of this study is organized as follows. Section 2 introduces related works of machine learning prediction and transfer learning method. Section 3 elaborates the machine learning models and transfer learning strategies in the cross-modal prediction framework. Section 4 describes the data sources and forecasting results of the proposed method. Finally, Section 5 discusses and concludes this study. This section reviews existing studies on transport prediction and focuses on two categories, namely, machine learning prediction and transfer learning prediction, as they can be applied to solve the challenging problems in transport forecasting: data missing problem and the inability problem to cope with new mobility services. In general, there are two types of machine learning models for transport demand forecasting: single-mode demand model and cross-modal demand model. Single-mode demand model is for a selected transport mode, and cross-modal demand model is for multiple transport modes. As explained before, joint demand model (predicting multiple transport modes in a combined model) is also considered as a part of cross-modal demand modal. There are many papers discussing single-mode demand models, especially for bike-share and public transit. In bike-share demand prediction, the novel machine learning methods [15] , [16] are widely used. Cui et al. [17] use an advanced XGBoost method for flow forecasting and then transform the passenger flow results into bike amount recommendations near a specific metro station. Deep neural networks are also widely used in transport demand forecasting. Zhang et al. [18] count potential bike-share users from public transport and then input this count value into a long short-term memory (LSTM) model to predict bike-share demand. Chai et al. [19] propose a multi-graph convolutional network (GCN) model to predict the station-level bike flow. There are many other studies such as decision trees and neural networks for public transit prediction [20] , [21] , [22] . Zhang et al. [23] firstly use a GCN and three-dimensional convolutional neural network (CNN) to integrate flow information and forecast metro passenger flow. Zhang et al. [24] take the flow forecast of a single metro station as a multi-input single-output regression prediction problem and use the light gradient boosting machine model to solve this problem. Jia et al. [25] propose an attention-based deep spatio-temporal network with multi-task learning and uses independent channels to model the recent, daily, and weekly metro flows. However, single-mode demand model is based on sufficient data for a transport mode and therefore cannot integrate the data of other transport modes and perform data imputation both for better predictions and for compensating for insufficient/missing data. Modern urban transport is a large-scale system consisting of multiple transport modes, including new mobility services and traditional public transit [26] . The collaborative service of urban transportation is based on demand prediction in multiple transport modes [27] . Bike-share is a popular type of shared transport, and its correlation with public transit is significant [28] . Meanwhile, bike-share is also a new micro-mobility mode, which starts providing service much later than public transit in many cities [29] . Bike-share data can provide broader spatial information for public transit prediction, and, in turn, public transit data can provide different demand features for bike share prediction. Hence, the cross-modal demand prediction of bike-share and public transit could better understand their relationships and address the drawbacks of lacking data. It is necessary to develop a cross-modal demand prediction methodology, which integrates multiple transport modes into a general prediction framework. Despite the necessity in practical operation, cross-modal demand prediction, or joint demand prediction in multiple transport modes, has not received sufficient attention in the existing studies. There are only few papers discussing joint demand prediction of multimodal urban transport. Toman et al. [30] use a vector autoregression model to estimate the usage proportions of different transport modes and apply a dynamic linear model to predict the total demand of all modes. Their two-stage method could forecast the city-level usage demand of metro, bike-share, taxi, and ride sharing in New York City. But their study takes the whole city as a single spatial object, which is not station-level and cannot be directly used in the actual operation such as bikeshare rebalancing and ride sharing matching. Liang et al. [31] also take New York City as the case study and propose a multi-relational spatio-temporal graph neural network for joint demand prediction in metro and ride sharing. Because of the data limit, their forecast horizon or prediction time interval is four hours, which is not short-term and cannot be directly used in the dynamic operation such as ride sharing matching. Their spatial objects are 136 metro stations and 63 ride sharing zones, which is better than taking the whole city as one place but still a small-scale prediction problem. These studies only discuss the small-scale problem of joint demand prediction, which is not practical in large-scale transport operations. Urban transport, especially MaaS, should consider the large-scale network with hundreds of stations or even thousands of stations. Therefore, a large-scale method for joint demand prediction is required, which is also the focus of our paper. Transport demand forecasting faces the challenge of large-scale networks [32] , [33] , and cross-modal demand forecasting makes this problem even more complicated. Both shared micromobility and public transit have hundreds or even thousands of stations. The demands for these stations need to be predicted and output at the same time. As a prerequisite for cross-modal demand forecasting, the input-output framework of MIMO [34] needs to be given full attention. But most studies focus on SISO prediction, and the research related to MIMO forecasting is relatively insufficient. Therefore, the novel machine learning models that adapt to the MIMO framework and cross-modal prediction need to be established. Transfer learning applies the knowledge of one domain to another related domain, which can provide better performance. Transfer learning has been widely applied in image processing, classification, and prediction. Ravanbakhsh et al. [35] apply Generative Adversarial Nets and a cross-channel approach in abnormal event detection, which use a discriminator as a supervisor in the video processing. Fawaz et al. [36] applied transfer learning in time series classification, and the input data are the sequences with different lengths. Zhang et al. [37] propose an advanced convolutional LSTM network to predict cellular data, in which a clustering algorithm is used to divide a city into different groups, and then a successive inter-cluster transfer learning strategy is applied for enhancing knowledge reuse. But transfer learning does not get enough attention in transport demand predicting. The existing transfer learning for transport research is limited [9] and primarily focuses on road traffic prediction. Transfer learning has different strategies, including feature extraction [35] , fine-tuning [9] , [36] , [38] , and split-brain [39] . Firstly, feature extraction is the basic transfer learning strategy, which keeps the model unchanged and only changes the input and output data. Li et al. [11] applied various transfer learning strategies into machine learning prediction for real-time road traffic and used the three areas of UK Highways England road networks as the case studies. They selected three points as the source and another point as the target in each area, then transferred the forecasting knowledge from source to target. They found that the feature extraction strategy performs best in their single-mode prediction study. Secondly, fine-tuning is more popular as it can overcome the differences between the source and target tasks. Wang et al. [10] combined transfer learning and deep learning to predict road traffic flow and found that the fine-tuning strategy is suitable in their study. Their method transferred the knowledge from data-rich cases into predicting short-term traffic flow in data-strapped cases, which could solve the data lacking problems. Lastly, as for split-brain, Zhang et al. [39] propose the split-brain autoencoders to obtain two disjoint sub-networks and predict one subset from another, which is applied in predicting color and grayscale of image synthesis tasks. Because of the differences in the knowledge transfer framework, the transfer learning strategies suitable for various prediction problems are also different. Transfer learning is a promising method in cross-modal transport prediction and solving data missing or lacking problems. As for data missing, many studies have demonstrated the reliability and effectiveness of transfer learning in addressing insufficient data [40] , [41] , [42] , [43] . Besides, transfer learning can be used to transfer forecasting knowledge not only between different areas on the same transport mode but also between different transportation modes in the same area. If the forecasting knowledge is transferred from a small-scale mode to a large-scale mode, transfer learning could also save the running time of large-scale demand prediction. Therefore, transfer learning could be a tool for predicting newly-operated micro-mobility services, especially bike-share and e-scooter sharing, using available public transit data. This is explored by this study. In this study, for the cross-modal demand prediction, machine learning and transfer learning are combined for the forecasting knowledge transfer between bike-share and public transit. We applied a framework of machine-learning-first transfer-learning-second for this cross-modal prediction problem. The process of transferring bike-share knowledge to public transit prediction is as follows. Firstly, we use machine learning to predict the dynamic demand of public transit; Secondly, we apply different transfer learning strategies to transfer the forecasting knowledge of public transit; Thirdly, we build the machine learning models with transferred knowledge to predict the large-scale dynamic demand of bike-share. Using a similar process, the knowledge of bike-share is also transferred to the machine learning prediction of public transit. In what follows, Section 3.1 presents the machine learning part, Section 3.2 describes the transfer learning part, and, finally, Section 3.3 introduces prediction model benchmarks and performance index. In this study, the stacked LSTM method is selected as the machine learning prediction model. LSTM is an elegant type among many RNN models, which has been widely used in transport prediction [44] , [45] , [46] , [47] . The RNN method has the feature of network delay recursion, which could grasp the patterns of dynamic systems. LSTM improves the basic RNN model with internal mechanisms called gates and the memory cell. The LSTM gates consist of forget gate, input gate, and output gate. This LSTM model can solve the problem of vanishing or exploding gradients and better deal with short-term memory conditions. Neural network depth is generally attributed to the success of many challenging predictions. In a stacked LSTM model, there are three types of layers: input layer, hidden layer, and output layer. In the input layer, the spatio-temporal flow data (the actual demand amounts in the former intervals of all stations) are used as the input matrix. As for the hidden layer, several LSTM layers are stacked and fused into the prediction model, and each LSTM layer has many units. In the output layer, the predicted demand amounts in the future interval of all stations are the output results. The architecture of our stacked LSTM model is shown in Figure 1 . The prediction approach of stacked LSTM model For the cross-modal demand prediction of bike-share and public transit, a general framework has been established among the various cities and scenarios. The detailed transfer learning frame in cross-modal prediction is proposed as follows. Firstly, the forecast models without transfer of bike-share and public transit are established separately. Secondly, transferring public transit knowledge to bike-share prediction and transferring bike-share knowledge to public transit prediction is to build forecast models with transfer. Lastly, compare the results of the forecast models without and with transfer. The architecture of transfer learning for cross-modal demand prediction is shown in Figure 2 . Firstly, the details of the FTF strategy are as follows. Because the input and output amounts of bike-share and public transit are different, the input and output layers of the prediction model should be tuned and cannot be frozen. The hidden layers are the transferred layers in transfer learning, which can be tuned or frozen. If the weights of the hidden layers from the related task are frozen in the target task, it is the transfer learning strategy of fine-tuning with freezing transferred layers. Secondly, FT is different from the FTF strategy. If the weights of the hidden layers from the related task are tuned in the target task, it is the transfer learning strategy of finetuning without freezing transferred layers. Thirdly, the SB strategy is a novel transfer learning type with two sub-tasks. The split-brain strategy splits the prediction task into two disjoint sub-tasks and predicts the output of one subset with the model of another subset. In the split-brain prediction study of Zhang et al. [39] , one sub-task predicts depth from images, while the other predicts images from depth. Five benchmarks are used to compare the forecasting performance of the proposed method, and the brief introduction of these baselines are as follows: a) One step. It takes the observed value at the former interval as the predicted value at the next interval. GCN is good at learning graph representations and has achieved superior performance in many tasks. where N represents the number of stations, T represents the amount of prediction intervals, yi,t represents the observed flow in the i-th station in the t-th interval, � represents the mean value of yi,t, and � , represents the predicted flow in the i-th station in the t-th interval. The trip data used in this study are from Chicago City and Nanjing City. As shown in Table 1 , these two cities both have large-scale multimodal transport services with more than 1,000 spatial Figure 3 shows the spatial distributions and temporal characteristics of Chicago and Nanjing transport. For taxi analysis, Chicago city is divided into five hundred small zones. Taxi zone centroids, bike-share stations, and metro stations are considered as the spatial objects of this study. In Chicago, taxi and bike-share services have a similar spatial distribution. As for Nanjing, some bike-share stations are near to metro, but others are far from the metro stations. Hourly passenger flows of the typical stations at Chicago and Nanjing are also displayed in Figure 3 (c) and (d). In Chicago, the morning peak is 9 to 10 am, and the evening peak is 5 to 6 pm. The bike peak is synchronized with the taxi peak. In Nanjing, the morning peak is 9 am, and the evening peak is 3 to 7 pm. The bike evening peak is earlier than the metro evening peak. The reason for the peak difference may be that the major users of Nanjing Public Bicycle are elderly people who would go grocery shopping from 3 to 4 pm. But metro users mainly travel for commuting, so the metro peak is 7 pm. The correlation matrix of the station-level demands for taxi and bike-share in Chicago is shown in Figure 3 (e), and the correlation matrix of the station-level demands for metro and bikeshare in Nanjing is shown in Figure 3 (f). In these two subfigures, "t" stands for taxi, "b" for bike The models of this study are coded with TensorFlow in Python, and their target is to predict the inflow values at all stations in the next horizon. Before estimating model results, the experiment setting needs to be determined, which consists of the spatial object, time interval, training, and test datasets. Firstly, the spatial object in each case would be reduced. The reason for spatial object reduction is that some transit are much bigger than bike-share. The reason is that public transit has much more trips than bike-share in both Chicago and Nanjing. It can be found that fine-tuning without freezing transferred layers (FT) strategy has the best performance among all transfer learning strategies. Fine-tuning without freezing transferred layers (FTF) strategy performs slightly worse than FT strategy. This is because changes to the parameters of the transferred layers can further improve the prediction model for different tasks. These findings also suggest that both public transit and bike-share modes can get prediction benefits from the transferred knowledge of each other. Transfer learning is proved to be effective in cross-modal demand prediction of bike-share and public transit. In order to deal with the missing data problem, there are two solution frameworks discussed in this paper. The first solution is using longer-term data of one transport mode to build a more reliable model and then predicting the demand of another mode. In this study, we also used threemonth and six-month passenger flow data to build the corresponding models. The results of these two models did not improve significantly compared to the model based on one-month passenger flow data. Therefore, the solution of longer period data is not applicable. The second solution is to directly use the split-brain (SB) strategy, with one mode of transport passenger flow as input and another mode passenger flow as output. SB strategy is found to be useful in missing-data prediction. For Chicago and Nanjing bike-share, prediction results of SB strategy are relatively good. It shows that if the data of large-scale spatial objects (bike-share) are missing, small-scale spatial objects (public transit: taxi or metro) can be used as a substitute for predicting the transport mode with large-scale spatial objects. This study focuses on the cross-modal demand prediction for multiple transport modes, which plays a key part in promoting service cooperation, increasing travel transfer, and improving dynamic operation. Transport demand prediction faces the problems of data missing or lacking and cannot effectively adapt the demand information of other transport modes. To deal with these problems, a combined framework of machine learning and transfer learning is proposed. To be more specific, the stacked long short-term memory model with fine-tuning strategy is established for cross-modal demand prediction. Firstly, suitable spatial objects are selected, and trip datasets are processed into passenger flow data. Secondly, machine learning models are used to build the basic prediction model without knowledge transfer. Lastly, various transfer learning strategies are applied and compared to get the transferred information and better forecasting results. For estimating the model performance, real-world case studies are conducted on bike-share and public transit services in Chicago and Nanjing. Generally, this work provides insights on how to combine machine learning and transfer learning for multimodal demand prediction. The key findings of this study are as follows: a) The joint framework of machine learning and transfer learning is effective for the large-scale problem of cross-modal demand prediction; b) The demand information of bike-share and public transit could provide valuable transferred knowledge for predicting the passenger demand of each other; c) Fine-tuning without freezing transferred layers strategy performs the best among all transfer learning strategies; d) The splitbrain strategy is effective in handling the missing-data problem; e) The stacked LSTM model can be combined with a suitable transfer learning strategy for solving the cross-modal demand prediction problem. Besides, the spatio-temporal distribution and correlation of bike-share and public transit are discovered by the visualization analysis. Bike-share and metro are mainly in the cooperation state, but bike-share and taxi have a certain degree of competitive relationship. This study of cross-modal demand prediction can be applied in several aspects. Firstly, the transfer learning approach of this study enables demand forecasting when the data of a transport mode is missing. For example, the MaaS management platform may face data transmission failures of bike-share, and only the passenger flow data of public transport is available. In this case, the model of this study can be used to predict the future passenger flow of bike-share to guide the citywide MaaS management. Secondly, cross-modal demand forecasting can guide the dynamic operation of multiple transportation modes. For example, the increase in metro passenger flow demand can result in dispatching more shared bikes into the vicinity of metro stations to meet users' travel and transfer needs. Thirdly, cross-modal demand forecasting can be applied for collaborative transportation services. The cross-modal demand forecast results can be used to infer the in-vehicle crowded information of public transit and the supply-demand balance of bike-share. And the information can recommend users to take more reasonable travel choices to better use multi-modal transportation services. The proposed method of cross-modal demand prediction can be further improved or extended in the following directions. Firstly, many new mobility services of MaaS do not have fixed stations, which makes the large-scale problem for cross-modal demand prediction even more challenging. For example, ride sharing, and e-scooter sharing have no physical stations and move freely around the city. These new mobility services lack a default spatial object such as the bikeshare station in our study, which makes their demand prediction more difficult. Therefore, a new concept of the clustering-based virtual station can be applied as the spatial object for MaaS demand prediction. Secondly, our cross-modal forecasting models need to be compatible with other mobility service datasets. New mobility services such as car sharing, stationless bike sharing, and shared automated vehicles have similarities and differences with bike-share in spatio-temporal demand distribution. Transferring the cross-modal model of this study into these new mobility services needs more research efforts. Lastly, cross-modal forecasting model should consider incorporating multi-source data, such as land use and weather conditions. It could be helpful to assess the impact of inputting more information and build a spatio-temporal demand forecasting model. Mobility as a service (MaaS): Charting a future context Should bike-sharing continue operating during the COVID-19 pandemic? Empirical findings from Nanjing, China Dockless bike sharing alleviates road congestion by complementing subway travel: Evidence from Beijing Exploring the health and spatial equity implications of the New York City Bike share system High-resolution assessment of environmental benefits of dockless bike-sharing systems based on transaction data A spatiotemporal and graph-based analysis of dockless bike sharing patterns to understand urban flows over the last mile Collaborative urban transportation: Recent advances in theory and practice Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques Cross-city Transfer Learning for Deep Spatio-temporal Prediction Road traffic flow prediction using deep transfer learning Transferability improvement in short-term traffic prediction using stacked LSTM network Graph Markov network for traffic forecasting with missing data LSTM-based traffic flow prediction with missing data Impact of Data Loss for Prediction of Traffic Flow on an Urban Road Using Neural Networks The station-free sharing bike demand forecasting with a deep learning approach and large-scale datasets Citywide bike usage prediction in a bike-sharing system Usage Demand Forecast and Quantity Recommendation for Urban Shared Bicycles Short-term Prediction of Bike-sharing Usage Considering Public Transport: A LSTM Approach Bike Flow Prediction with Multi-Graph Convolutional Networks Passenger Flow Forecasting in Metro Transfer Station Based on the Combination of Singular Spectrum Analysis and AdaBoost-Weighted Extreme Learning Machine Metro Passenger Flow Prediction Model Using Attention-Based Neural Network Attention-based Graph Neural Network Enabled Method to Predict Short-term Metro Passenger Flow Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit Short-Term Passenger Flow Forecast of Rail Transit Station Based on MIC Feature Selection and ST-LightGBM considering Transfer Passenger Flow ADST: Forecasting Metro Flow Using Attention-Based Deep Spatial-Temporal Networks with Multi-Task Learning Smart mobility and public transport: Opportunities and challenges in rural and urban areas Horizontal collaborative transport: survey of solutions and practical implementation issues The Last Mile Matters: Impact of Dockless Bike Sharing on Subway Housing Price Premium Planning for Bike Share Connectivity to Rail Transit Dynamic predictive models for ridesourcing services in New York City using daily compositional data Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach Predicting station-level hourly demand in a large-scale bikesharing network: A graph convolutional neural network approach Large-scale short-term urban taxi demand forecasting using deep learning Forecasting usage and bike distribution of dockless bike-sharing using journey data Training Adversarial Discriminators for Cross-Channel Abnormal Event Detection in Crowds Transfer learning for time series classification Deep Transfer Learning for Intelligent Cellular Traffic Prediction Based on Cross-Domain Big Data Short-term predictions of multiple wind turbine power outputs based on deep neural networks with transfer learning Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data Transfer Learning Based Fault Diagnosis with Missing Data Due to Multi-Rate Sampling Fair Transfer Learning with Missing Protected Attributes Missing Modality Transfer Learning via Latent Low-Rank Constraint DeepPF: A deep learning based architecture for metro passenger flow prediction Parallel Architecture of Convolutional Bi-Directional LSTM Neural Networks for Network-Wide Metro Ridership Prediction Sequence to sequence learning with attention mechanism for short-term passenger flow prediction in large-scale metro system Multi-output Deep Learning for Bus Arrival Time Predictions