key: cord-0609302-mspn6u3r authors: Wang, Zhaonan; Jiang, Renhe; Xue, Hao; Salim, Flora D.; Song, Xuan; Shibasaki, Ryosuke title: Event-Aware Multimodal Mobility Nowcasting date: 2021-12-14 journal: nan DOI: nan sha: 5f447bc454eb572374751576ee0932e5ecba7f8b doc_id: 609302 cord_uid: mspn6u3r As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/underdoc-wang/EAST-Net. Mobility-as-a-Service (MaaS), as an emerging paradigm of transport service, seamlessly integrates multimodal mobility services (e.g. public transport, ride-hailing, bike-sharing), which streamlines trip planning, ticketing (for users), operating optimization, emergency response (for providers), and traffic management (for city managers). For a smooth operation of MaaS, spatio-temporal predictive modeling for multimodal transport of crowds is indispensable. However, the existing methods either implicitly handle the interaction between supply and demand of different modes or assume it to be time-invariant (Ye et al. 2019 ). This task is even more challenging in scenarios where societal events (e.g. holiday, severe weather, epidemic) take place and deviate collective Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. † Work was done while the first author was virtually visiting the CRUISE research group at RMIT University. § Corresponding author. Year, and a Historic Blizzard "Jonas" Took Place mobility significantly from the normality (e.g. daily, weekday routines). Moreover, as illustrated in Figure 1 , the impacts of different events differ, e.g. taxi demand rockets on New Year's eve but vanishes at Christmas and during the blizzard, and the volatility brought to each transport mode varies, e.g. recovery of share bike demand takes longer than the one of taxi after the blizzard. While tremendous progress has been made in spatiotemporal modeling thanks to deep learning (Shi et al. 2015; Zhang, Zheng, and Qi 2017; Li et al. 2018; Wu et al. 2019; Zheng et al. 2020) , most, if not all, of them advance by exploiting high-level spatio-temporal regularities. The volatility brought by societal events, on the other hand, is by far downplayed and usually handled by simple rectifications, such as incorporating temporal covariates (e.g. time-of-day, day-of-week, whether-holiday) as auxiliary input (Yao et al. 2018; Zonoozi et al. 2018) , adding a memory bank to reuse similar patterns in history (Yao et al. 2019; Tang et al. 2020) . These manipulations to a certain degree bring time and holiday awareness, but they mainly help with the periodic and precedent parts and would still fail under more extreme scenarios like unprecedented events (e.g. historic blizzard, COVID-19 pandemic). There is another line of research (Fan et al. 2015; Jiang et al. 2018 Jiang et al. , 2019 attempting to capture anomalous mobility tendency under events in an online fashion based on low-order Markov assumption and fine-grained time slot setting. These practices are arguably circumventing the inherent difficulty of the task instead of truly tackling it. In this paper, we tackle the identified twofold unaware-ness of the existing spatio-temporal networks, namely intermodality-unaware and event-unaware, correspondingly via: (1) explicitly representing the dynamic interactions among multiple mobility modes; (2) intrinsically enhancing event-awareness and adaptivity of predictive models for various scenarios, including unprecedented events. Specifically, we design a heterogeneous information network to build the intermodal interactions into the widely adopted spatio-temporal modeling strategy; then leverage techniques of memory and dynamic filter networks that encourage the model to learn to distinguish and generalize to diverse scenarios. Based on the above two motivations, we propose an event-aware spatio-temporal network (EAST-Net). Our contributions are summarized as follows: • We design a new heterogeneous mobility information network (HMIN) to explicitly represent intermodal interactions (or intermodality) for spatio-temporal multimodal mobility modeling. • We propose a novel memory-augmented dynamic filter generator (MDFG) to produce sequence-specific parameters on-the-fly, intrinsically improving event-awareness and adaptivity of spatio-temporal networks. • We conduct a series of experiments on four real-world event-mobility datasets, and the results validate the superiority of EAST-Net quantitatively and qualitatively. Here we briefly review two lines of research: the first line is on modeling of event-related human mobility, while the other involves techniques for spatio-temporal forecasting and dynamic filter generation. (1) The former can be broken down into two branches, namely event-oriented and eventdriven modeling. On the one hand, human mobility data (e.g. GPS (Konishi et al. 2016) , origin-destination records (Zhang, Zheng, and Yu 2018; Zhang et al. 2019) , trip survey ) are commonly used as the underlying clues to infer both local and citywide anomalous events. On the other hand, mobility behavior is affected by these societal events conversely (Song et al. 2014; Fan et al. 2015; Jiang et al. 2019; Xie et al. 2020 ) and thereby deviates from normal patterns. (2) By effectively capturing complex non-linear dependency in the space and time, deep learning-based predictive models as a group outperforms classical statistical and matrix/tensor-based methods on both individual (e.g. call detail records (Feng et al. 2018) , GPS trajectory (Fan et al. 2019) , Point-of-Interest visits (Xue et al. 2021) ) and collective mobility (e.g. crowd volumes (Zhang, Zheng, and Qi 2017) , demand of multimodal transport modes (Ye et al. 2019) , origin-destination trips Jiang et al. 2021 )) modeling. Besides, mainly studied for tasks like video prediction (Jia et al. 2016 ) and image classification (Yang et al. 2019; Zhou et al. 2021) , dynamic filter generation broadly shares a similar idea as model-based meta-learning (e.g. memory-augmented neural networks (Santoro et al. 2016) , meta networks (Munkhdalai and Yu 2017)) or hypernetworks (Ha, Dai, and Le 2016) , by conditioning parameters of a target module on another network. Moreover, there have been some recent studies fur-thering the idea onto continual learning (von Oswald et al. 2020) ; meta knowledge-based parameterization (Pan et al. 2019) and spatial distinct filter generation (Cirstea et al. 2021 ) for spatio-temporal forecasting. In this section, we firstly formulate multimodal mobility nowcasting problem, then briefly revisit a standard solution, namely spatio-temporal network (ST-Net), for this task. Given a specified spatio-temporal granularity, the time and space can be discretized into a set of equal-length time slots and regions (not necessarily equal-area), respectively, denoted by T = {τ t |t ∈ (1, · · ·, T )} and R = {η n |n ∈ (1, · · ·, N )}. Considering there are in total M modes of mobility, we can build a multimodal mobility tensor M ∈ R T ×N ×C , where C = 2 · M if modeling the supply and demand of multiple transport modes; C = M if modeling the visit volume of multiple travel purposes. Accordingly, multimodal mobility nowcasting problem can be formulated as follows: Given α-step consecutive observations in M, denoted by (X t−α+1 , · · ·, X t ), where X ∈ R N ×C , return the immediate expectations for the next β-step, i.e. (X t+1 , · · ·,X t+β ). Note that auxiliary temporal covariates can be available from time slot τ t−α+1 to τ t+β , denoted by T cov ∈ R (α+β)×v , where v is total number of the covariates. Formally written as: To solve the above problem, recent studies commonly exploit high-level spatio-temporal dependency in observations (Zhang, Zheng, and Qi 2017; Li et al. 2018; Ye et al. 2019; Wu et al. 2020) . Particularly, convolutional and recurrent neural networks (e.g. CNN, GCN, TCN, RNN) are two typical submodules utilized to handle the underlying dependencies over the space R and timeT , respectively. This class of models arguably share a similar spatio-temporal view, which prioritizes the first and second dimensions in (X t−α+1 , · · ·, X t ) ∈ R α×N ×C . We term this modeling strategy Spatio-Temporal Graph (STG), as demonstrated in Figure 3, in which the third dimension of the observations is treated as features evolving on STG. We further term models built on top of STG Spatio-Temporal Network (ST-Net). Without loss of generality, we combine GCN and RNN to denote a ST-Net, which handles spatial dependency by a graph and temporal dependency in a recurrent form: Equation (2) defines the basic graph convolution operation G , which takes input X ∈ R N ×p and returns H ∈ R N ×q given a graph topology matrix P ∈ R N ×N (P is its normalized form), approximation order K, and trainable parameters Θ ∈ R (K+1)×p×q . Equation (3) defines an extended version of GRU (a form of RNN), namely GCRU, with matrix multiplications replaced by graph convolutions (Equation (2)). It is noteworthy that DCGRU can be seen as a special form of GCRU by restrictingP to be random walk normalized transition matrix and performing bidimensional graph diffusion. Then, stacking multiple layers (denoted by l) of GCRU forms encoder and decoder of a ST-Net, abbreviated as ST-Enc/ST-Dec in Figure 2 . Besides, as illustrated in Figure 2a , temporal covariates can be used as auxiliary input (Yao et al. 2018; Zonoozi et al. 2018) to equip ST-Net with time and holiday awareness. In this case, X denotes a concatenation operation and T t is the linear projected representation of T cov at τ t . Another rectification for ST-Net (demonstrated in Figure 2b ) attaches an external memory bank (Yao et al. 2019; Tang et al. 2020) to the decoder such that some typical spatio-temporal patterns can be stored for reuse. This memory is implemented by a parameter matrix M ∈ R m×D , where m and d denote the total number of memory records and dimension of each one. Before making the final output, decoder makes a query to M for find similar representations, which is implemented by attention mechanism (Bahdanau, Cho, and Bengio 2014; Vaswani et al. 2017) . Formally, where Q t ∈ R D denotes the query vector projected from flattened H (l) t ; φ j is the attention score corresponding to j-th memory record. The obtained vectorṼ can be reshaped back and concatenated with H In this section, we elaborate the motivations and techniques for improving ST-Net, and present Event-Aware Spatio-Temporal Network (EAST-Net) as a more adaptive framework for multimodal mobility nowcasting. As presented in Figure 3 , STG, the fundamental of ST-Net, prioritizes spatio-temporal modeling while restricting all features (i.e. mobility modes) to evolve together on the fixed STG. We argue that this spatio-temporal view restricts the modeling of dynamic interactions among different modes of mobility, which is in fact the operating mechanism of MaaS. As demonstrated in Figure 1 , a societal event may impact different transport modes variously, which confirms the necessity for intermodality modeling. Thus, we are motivated to design a new underlying structure, i.e. Heterogeneous Mobility Information Network (HMIN), to jointly represent intermodal interaction and spatio-temporal dependency. µ M } denote node set of regions and mobility modes, respectively; E sp , E mo , E sp-mo , E t-sp , E t-mo denote five edge sets for the relations in region-to-region, mode-to-mode, region-to-mode, timeto-region, time-to-mode. By this definition, the intermodal relationship and its dynamicity can be represented by E mo and E t-mo ; and the task of multimodal mobility nowcasting is reformulated as a link prediction task for edge set E sp-mo (|E sp-mo | = N · C) from τ t+1 to τ t+β . Here we propose a simple yet generic framework to encode-decode HMIN by applying handy GCRU in a similar fashion to ST-Net. Denoting the framework of ST- Enc/ST-Dec (GCRU in multi-layer) as H t+1 , · · ·, H t+β = GCRU Enc-Dec (X t−α+1 , · · ·, X t ), processing of HMIN can be decomposed into two views (i.e. spatial and intermodal) followed by a stepwise fusion layer, formally denoted by: (5) where ε ∈ (1, · · ·, β) denotes the step index within horizon β; H (sp) t+ε ∈ R N ×q and H (mo) t+ε ∈ R C×q denote spatial and modal embeddings on V sp and V mo at time slot τ t+ε , respectively; W out ∈ R q×q denotes a parameter matrix to fuse the node embeddings for link generation. For simplicity, we denote this framework by HMINet (Equation (5)) and consider it to be a general case of ST-Net, which only takes the spatial view and let W out ∈ R q×C , H (mo) t+ε = I C . Essentially, edge sets E sp and E mo , representing spatial and intermodal dependencies, are handled by graph convolution in each domain; unidirectional temporal edges E t-sp and E t-mo are encoded by the recurrent structure; HMINet learns the mapping from α-step to β-step in edge set E sp-mo . Although enhancing ST-Net in an intermodality-aware way, HMINet introduces extra parameters by approximately same amount that ST-Net has. To control the model size and, more importantly, empower it to be aware of and adaptive to various scenarios, we propose a novel Memory-augmented Dynamic Filter Generator (MDFG). MDFG is motivated by a line of research on dynamic filter networks (DFN) (Jia et al. 2016; Yang et al. 2019; Zhou et al. 2021) , which have been mainly studied on convolutional kernels for image and video-related tasks. The core idea behind DFN is instead of sharing a same trainable filter for all samples in a dataset, dynamically generating filters conditioned on an input sample, which by nature increases the flexibility and adaptivity of model. In light of DFN, we argue that the indistinguishability between normal and event scenarios roots in the way that a same set of parameters (e.g. (3)) is shared for all observational sequences (X t−α+1 , · · ·, X t ) by vanilla ST-Net. In other words, parameters in ST-Net are sequence-agnostic. We thereby utilize the idea of DFN and further put parameters conditioned on a plugin memory bank M mob ∈ R m×D to encourage discovery of high-level mobility prototypes, which are representations incorporating spatial, temporal, and multimodal knowledge. To be specific, M mob takes concatenated node embeddings [H Enc-Dec . This interaction between HMINet and MDFG occurs in an on-the-fly manner, which generates sequencespecific parameters (denoted by Θ t ). Formally, where denotes a dynamic filter generation (DFG) layer, which can be implemented in various ways. Without loss of generality, we utilize a linear projection in this case. ϕ denotes a filter normalization (FN) layer (Zhou et al. 2021 ), used to normalize the generated parameters and avoid gradient vanishing and exploding. Based on HMINet and MDFG, we further make three refinements to the framework of Event-Aware Spatio-Temporal Network (EAST-Net), as illustrated in Figure 4 , which can be trained in an end-to-end fashion by minimizing a specified loss function using the standard backpropagation. • Temporal covariates are fused stepwise for basic time and holiday awareness for both spatial and intermodal views, following the common practice (Yao et al. 2019 ). Then, X t+ε is fed into HMINet (Equation (5)) as input. • Pyramidal structure (Zonoozi et al. 2018 ) is leveraged in GCRU encoders to help accelerate the training of HMINet and discover multi-level temporal pattern for mobility prototype extraction. In a case by a factor of 2: • Adaptive edge sets E sp , E mo are learnt in HMINet without making any prior assumptions on either intermodal or spatial relationship Wu et al. 2019 ). Essentially, a pair of parameterized node embeddings are initialized for both GCRU (sp) (mo) Enc-Dec to derive corresponding topology for graph convolutions: where embeddings E (sp) , F (sp) ∈ R N ×µsp and E (mo) , F (mo) ∈ R C×µmo are trained to learn the underlying region-to-region and mode-to-mode dependencies within node sets V sp and V mo ; the derived topology is normalized to [0, 1] by softmax to simulate signal diffusion in each domain (replacingP in Equation (2)). To evaluate the proposed EAST-Net, we collect four realworld datasets with different spatio-temporal scales and coverage (presented in Table 1 ), and represent multimodal mobility with transport modes on three city-level datasets (for New York City, Washington DC, Chicago), and with travel purpose on the other country-level dataset (for the United States). Similarly to the previous studies (Zhang, Zheng, and Qi 2017; Ye et al. 2019; Jiang et al. 2019) , trip records (e.g. taxi, share bike) or POI visits are processed as in/outflow (supply/demand) or visit volume to be further aggregated onto a given spatio-temporal granularity. Particularly, each dataset is designed to cover a set of holidays and a historic event with big social impact, i.e. the winter storm Jonas or COVID-19 pandemic. Following the common practice (Zonoozi et al. 2018; Yao et al. 2018) , we encode temporal covariates of each time slot (i.e. time-of-day, day-of-week, month-of-year, whether-holiday) in an one-hot manner as auxiliary sequence input. We chronologically split each dataset for training, validation, testing with a ratio of 7 : 1 : 2, such that the lengths of test sets are roughly last 20 days for JONAS-{NYC, DC}, 110 days for COVID-CHI, and 40 days for COVID-US. Lengths of observational and nowcasting sequences are set to α = 8 and β = 8, respectively; number of GCRU layers L = 2 with approximation order K = 3 and hidden dimension q = 32; embedding dimensions for T cov v = 2, µ (sp) = 20 and µ (mo) = 3; mobility prototype memory m = 8 and D = 16. For model training, batch size = 32; learning rate = 5 × 10 −4 ; maximum epoch = 100 with an early stopper with a patience of 10; MAE is chosen to be optimized using Adam. We implement EAST-Net with PyTorch and carry out experiments on a GPU server with NVIDIA GeForce GTX 1080 Ti graphic cards. For evaluation, we adopt three commonly used metrics, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). In this section, to understand the performance of our approach, we develop a group of research questions (RQ) and design a series of experiments correspondingly: (1) How does EAST-Net perform compared with the existing methods? (2) How does EAST-Net perform compared with its model variants? (3) How does EAST-Net behave in different scenarios of societal events? Holidays + COVID Pandemic † * Scooter trip data is only available in COVID-CHI set. Travel purpose is measured by POI visitations of 10 categories: {grocery store, retailer, transportation, office, school, healthcare, entertainment, hotel, restaurant, service}, according to the NAICS industry codes (https://www.naics.com/search-naics-codes-by-industry/). † COVID-19 pandemic outbroke in late March 2020; COVID-US set depicts the early stage (first wave in April), and COVID-CHI set depicts the progression (first to third waves till end of 2020) of the pandemic. Quantitative Evaluation 1 To quantitatively evaluate the overall prediction accuracy of EAST-Net on the multimodal mobility nowcasting problem, we implement eight baselines on mobility/traffic-related spatio-temporal prediction for comparison, including: • We present the performance comparison of EAST-Net and baselines in Table 2 . It is noticeable that the error range on four datasets varies in magnitude: among three citylevel sets, DC and CHI have relatively smaller transport volume than NYC; COVID-US is apparently the most tricky set which is state-level, of ten modes for travel purpose, and being tested at the very early stage (first wave) of the pandemic. Besides, acceptable results obtained by HA on JONAS-DC, NF on JONAS-NYC and COVID-CHI indicate a rather strong short-term temporal dependency in JONAS-NYC and COVID-CHI, and a daily periodicity in JONAS-DC. By treating the problem simply as time series, Transformer does not acquire satisfactory accuracy. Taking spatial locality into consideration, CoST-Net performs better than Transformer on JONAS-{NYC, DC}, but the pre-trained convolutional structure not only fails it on COVID-CHI but limits it from handling graph-based data like COVID-US. Then, among four graph-based models, GW-Net prevails in terms of most metrics on all datasets. Lastly, speaking of EAST-Net, we can observe a consistent and dramatic improvement throughout JONAS-{NYC, DC} and COVID-US, which undoubtedly confirms the efficacy of EAST-Net. The exception on COVID-CHI, we think, can be explained by: (1) A coarse time slot (2-hour) setting "smoothes" sudden changes, making the task easier for other models; (2) Along with the progression (first to third waves) of COVID pandemic, other models gradually learn the pandemic pattern as a new normality. Quantitative Evaluation 2 To understand how EAST-Net improves from the canonical ST-Net, we implement ST-Net (in Equation (3)) and its two rectified forms (in Figure 2a and 2b), as well as HMINet (in Equation (5)) for comparison. As presented in Table 3 , within the ST-Net family, a regular memory bank improves ST-Net in most cases, but not as significantly as temporal covariates do. However, adding T cov deteriorates the performance on COVID-US, which is actually reasonable because the reinforced awareness of periodicity backfires especially at the early stage of a historic epidemic when human mobility began to deviate (because of quarantine measures). In comparison, adopting HMINet drops all metrics compared with regular ST-Net especially on COVID-US, which validates our motivation for explicit intermodality modeling. Besides, adopting HMIN on COVID-CHI seems not as helpful as on other datasets. This issue, we think, may be caused by including the scooter data, which is in fact a pilot program in Chicago and thus has some months without any data. Lastly, comparing HMINet and EAST-Net side by side, we can observe a consistent performance improvement, which verifies the effectiveness of MDFG in various scenarios. Qualitative Evaluation To understand how EAST-Net behaves in diverse scenarios of societal events, we conduct two case studies on JONAS-NYC and COVID-US. In Figure 5 , a clear no-mobility period is expected under the impact of the historic blizzard "Jonas". GW-Net, a stateof-the-art model according to Table 2 , simply makes native forecasting (repeating the latest observation) during this anomalous period. In contrast, EAST-Net can quickly adapt to a declining-to-zero tendency (although causing underestimations afterwards). In addition, as illustrated in Figure 6 , the composition of mobility prototypes in memory records for generating momentary filters is clearly differentiated between (1) normal workdays and weekend with a holiday; (2) a long weekend and the "Jonas" period. These observations demonstrate the event-awareness and adaptivity of EAST-Net under a short-term event causing sudden volatility. In Figure 7 , stream graphs (Byron and Wattenberg 2008) for state averaged POI visits in ten categories during the first wave of COVID pandemic are presented. A stream graph is a variation of stacked area graph by positioning layers to minimize weighted wiggle (sum of the squared slopes). In our case, an overall negative "tendency" is expected according to the ground truth. While an opposite positive "tendency" is produced by GW-Net, EAST-Net can capture the overall shape correctly. In detail, on the Memorial Day (25 May. 2020), EAST-Net also better catch a obviously less volume of POI visits compared with GW-Net. In addition, according to Figure 8 , EAST-Net actually became aware of the new mobility pattern as early as March, the very beginning of the epidemic in the US. These observations reconfirm the eventawareness and adaptivity of EAST-Net, particularly under a long-term event imposing lasting impact. In this paper, we tackle the multimodal mobility nowcasting problem in response to various event scenarios. By designing a heterogeneous mobility information network for explicitly representing intermodality and a memoryaugmented dynamic filter generator for producing sequencespecific parameters on-the-fly, we propose an event-aware spatio-temporal network (EAST-Net). Extensive experiments on four real-world datasets verify the event-awareness and adaptivity of EAST-Net, which is even applicable to unprecedented events. In the next step, we plan to improve the speed of adaptation and efficiency of EAST-Net. Neural Machine Translation by Jointly Learning to Align and Translate Stacked Graphs-Geometry & Aesthetics Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting EnhanceNet: Plugin Neural Networks for Enhancing Correlated Time Series Forecasting Decentralized Attention-based Personalized Human Mobility Prediction City-Momentum: An Online Approach for Crowd Behavior Prediction at a Citywide Level DeepMove: Predicting Human Mobility with Attentional Recurrent Networks Dynamic Filter Networks DeepUrbanMomentum: An Online Deep-Learning System for Short-Term Urban Mobility Prediction DeepUrban-Event: A System for Predicting Citywide Crowd Dynamics at Big Events Countrywide Origin-Destination Matrix Prediction and Its Application for COVID-19 CityProphet: City-Scale Irregularity Prediction Using Transit App Logs Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Urban Traffic Prediction from Spatio-Temporal Data Using Deep Meta Learning Meta-Learning with Memory-Augmented Neural Networks Convolutional LSTM network: A machine learning approach for precipitation nowcasting Prediction of Human Emergency Behavior and Their Mobility Following Large-Scale Disaster Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values Attention is All You Need Continual Learning with Hypernetworks. In International Conference on Learning Representations (ICLR '20) Origin-Destination Matrix Prediction via Graph Convolution: A New Perspective of Passenger Demand Modeling Forecasting Ambulance Demand with Profiled Human Mobility via Heterogeneous Multi-Graph Neural Networks Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks Graph WaveNet for Deep Spatial-Temporal Graph Modeling Deep Graph Convolutional Networks for Incident-Driven Traffic Speed Prediction MobT-Cast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction Condconv: Conditionally Parameterized Convolutions for Efficient Inference Learning from Multiple Cities: A Meta-Learning Approach for Spatial-Temporal Prediction Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction Co-Prediction of Multiple Transportation Demands based on Deep Spatio-Temporal Neural Network Detecting urban anomalies using multiple spatio-temporal data sources Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction A Decomposition Approach for Urban Anomaly Detection across Spatiotemporal Data GMAN: A Graph Multi-Attention Network for Traffic Prediction Decoupled Dynamic Filter Networks Periodic-CRN: A Convolutional Recurrent Model for Crowd Density Prediction with Recurring Periodic Patterns This work was partially supported by JSPS KAK-ENHI (JP20K19859), JST Strategic International Collaborative Research Program (SICORP) (JPMJSC2002, JP-MJSC2104), and Australian Research Council (ARC) Discovery Project (DP190101485). We are also appreciative of the open POI data (i.e. Weekly Patterns) SafeGraph has made.