key: cord-0293457-t02u1jcx authors: Jiang, Weiwei title: Graph-based Deep Learning for Communication Networks: A Survey date: 2021-06-04 journal: nan DOI: 10.1016/j.comcom.2021.12.015 sha: 67d2f83dbc1403b6962428d639e9b0fa9f1305c6 doc_id: 293457 cord_uid: t02u1jcx Communication networks are important infrastructures in contemporary society. There are still many challenges that are not fully solved and new solutions are proposed continuously in this active research area. In recent years, to model the network topology, graph-based deep learning has achieved the state-of-the-art performance in a series of problems in communication networks. In this survey, we review the rapidly growing body of research using different graph-based deep learning models, e.g. graph convolutional and graph attention networks, in various problems from different types of communication networks, e.g. wireless networks, wired networks, and software defined networks. We also present a well-organized list of the problem and solution for each study and identify future research directions. To the best of our knowledge, this paper is the first survey that focuses on the application of graph-based deep learning methods in communication networks involving both wired and wireless scenarios. To track the follow-up research, a public GitHub repository is created, where the relevant papers will be updated continuously. Communication networks are ubiquitous in contemporary society, from the widely used Internet and 4G/5G cellular networks to the fast-growing Internet To the best of the authors' knowledge, this paper presents the first literature 2) Well-organized Summary: We summarize the problem to solve, the graphbased solution and the GNNs used in each study in a unified format, which would be useful as a reference manual. 3) Future Directions: We propose several potential future directions for re-searchers interested in relevant topics. For reference, the list of the acronyms frequently used in this survey is summarized in Table 1 . The remainder of this paper is organized as follows. In Section 2, we introduce the progress of conducting literature search and selection. In Section 3, we introduce the GNNs used in the reviewed studies. In Section 4, we summarize the studies in wireless networks. In Section 5, we summarize the studies in wired networks. In Section 6, we summarize the studies in software defined networks. In Section 7, we point out future directions. In Section 8, we draw the conclusion. To collect relevant studies, the literature is searched with various combinations of two groups of keywords. The first group is about the graph-based deep learning techniques, e.g., "Graph", "Graph Embedding", "Graph Neural Network", "Graph Convolutional Network", "Graph Attention Networks", "Graph-SAGE", "Message Passing Neural Network", "Graph Isomorphism Network", etc. The second group is about the communication networks as well as specific problems, e.g., "Wireless Network", "Cellular Network", "Computer Network", "Software Defined Networking", "Traffic Prediction", "Routing", "Service Func- years for solving networking-related problems, with only one paper selected for most journals or conferences. [42, 43] International Journal of Network Management [44] Performance Evaluation [45] Sensors [46] Transactions on Emerging Telecommunications Technologies [47] 3. Graph-based Deep Learning Introduction In this section, we first present some typical examples of the graph structures used in communication networks. Then we give a short introduction of the graph-based deep learning models, especially those used in the surveyed papers. Finally, we discuss the pros and cons of applying graph-based deep learning models in the networking domain. International Conference on Information Networking (ICOIN) [70, 71] International Conference on Information and Communication Technology Convergence (ICTC) [72] International Conference on Network and Service Management (CNSM) [73, 74, 75] International Conference on Real-Time Networks and Systems (RTNS) [76] International Conference on Wireless Communications and Signal Processing (WCSP) [77] International Conference on emerging Networking EXperiments and Technologies (CoNEXT) [78] International Symposium on Networks, Computers and Communications (IS-NCC) [79] Opto-Electronics and Communications Conference (OECC) [80] From the graph theory, a simple graph is defined as is the set of nodes and E is the set of edges between nodes. In communication networks, the edges can be either directed or undirected, depending on the specific problems. Both nodes and edges can be associated with some attributes as the features, either static or dynamic. Two graph examples are given for the wired and wireless scenarios respectively. In Figure 4 , the communication graph from the Abilene network is presented, which consists of 11 nodes and 14 edges. Each node represents the physical backbone router and the node features include the inflow and outflow traffic volumes. Each edge represents the physical transmission link and the edge features include the transmission metrics, e.g., bandwidth and delay. Similar communication graphs are built from other network topologies, e.g., the Nobel, GÉANT, Germany50, and AT&T backbone networks, can be found in [47, 59, 33 ]. In Figure 5 , the interference graph for a homogeneous ad-hoc network is presented, which consists of 3 nodes and 3 edges. Different from Figure 4 , the nodes in Figure i [82] . The undirected edge between node i and node j models the interference between two transceiver pairs and the edge features are the interference CSIs h ij and h ji . The interference graph built for the heterogeneous ad-hoc network case can be further found in [88] . An adjacency matrix A is introduced to incorporate the network topology information into the architecture of neural networks. Let e ij represents the edge between node v i and node v j . Then the element of the adjacency matrix A is defined as follows: A ij = 1 if e ij ∈ E, otherwise, A ij = 0. Here the binary matrix A only captures the connection relationship. If A is symmetric, the graph is undirected, otherwise, the graph is directed. More complex adjacency matrices can be defined similarly, e.g., the distance matrix or the interference matrix. For defining the GNNs in the next part, more notations are introduced here. N is the number of nodes and I N is the identity matrix with size N . The node feature matrix of a graph is defined as X ∈ R N ×d , where d is the dimension of the node feature vector. Since the research for graph-based deep learning is still in a fast pace with new models appearing continuously, we have no intention of conducting a thorough literature search on the graph-based models. In this section, we would focus on a short introduction for the GNNs used in the surveyed studies. For those who are interested in the whole picture of graph neural networks and a deeper discussion of the technical details, recent surveys [11, 12, 13, 14, 15] are recommended. The relevant graph-based deep learning models are listed chronologically in Figure 6 . Please note that the listed conferences may be lagged behind the preprint versions, which could be released one or two years earlier. As a pioneering study, GNN is introduced in [89] , which extends the application of neural networks from Euclidean structure data to non-Euclidean structure data. GNN is based on the message passing mechanism, in which each node updates its state by exchanging information with each other until it reaches a certain stable state. Afterwards, various GNN variants are proposed, e.g., Graph We first introduce the Graph Embedding (GE) models. In mathematics, embedding is a mapping function f : X → Y , in which a point in one space X is mapped to another space Y . Embedding is usually performed from a high-dimensional abstract space to a low-dimensional space. Generally speaking, the representation mapped to the low-dimensional space is easier for neural networks to handle with. In the case of graphs, graph embedding is used to transform nodes, edges, and their features into the vector space, while preserving properties like graph structure and information as much as possible. For the studies covered in this survey, several graph embedding models are involved, including structure2vec [90] , GraphSAGE [91] , and GE [92] . In a transductive learning approach, Structure2vec [90] is based on the idea that if the two sequences composed of all the neighbors of two nodes are similar, then the two nodes are similar. GraphSAGE [91] is a representative of inductive learning. It does not directly learn the representation of each node, but learns the aggregation function instead. For the new node, its embedding representation is generated directly without the need to learn again. Furthermore, a novel adversarial regularized framework is proposed for graph embedding in [92] . Then we introduce the GCN models. GCN extends the convolution operation from traditional data (such as images) to graph data, inspired by the convolutional neural networks which are extremely successful for image-based tasks. The core idea is to learn a function mapping, through which a node can aggregate its own features and the features of its neighbors to generate the new representation. Generally speaking, there are two types of GCN models, namely, spectral-based and spatial-based. Based on graph signal processing, spectral-based GCNs define the convolution operation in the spectral domain, e.g., the Fourier domain. To conduct the convolution operation, a graph signal is transformed to the spectral domain by the graph Fourier transform. Then the result after the convolution is transformed back by the inverse graph Fourier transform. Several spectralbased GCNs are used in the surveyed studies, e.g., GNN [93] , ChebNet [94] , and GCN [95] , which improve the convolution operation with different techniques. By introducing a parameterization with smooth coefficients, GNN [93] attempts to make the spectral filters spatially localized. ChebNet [94] learns the diagonal matrix as an approximation of a truncated expansion in terms of Chebyshev polynomials up to Kth order. To avoid overfitting, K = 1 is used in GCN [95] . More specifically, the graph convolution operation * G in GCN is defined as follows: where W is a learnable weight matrix, i.e., the model parameters. To alleviate the potential gradient explosion problem, the graph convolution operation is further transformed into: Several spatial-based GCNs are also used in the surveyed studies, which defines the convolution operation directly on the graph based on the graph topology. To unify different spatial-based variants, Message Passing Neural Network (MPNN) [96] proposes the usage of message passing functions, which contain a message passing phase and a readout phase. The message passing phase is defined as follows: vi is the message aggregated from the neighbors of node v i , M (t) (·) is the aggregation function in the t-th iteration, X (t) i is the hidden state of node v i in the t-th iteration, and e ij is the edge feature vector between node v i and node v j . The readout phase is further defined as follows: where U (t) (·) is the readout function in the t-th iteration. Graph Network (GN) [97] also unifies many GNN variants, by learning nodelevel, edge-level and graph-level representations. Graph Isomorphism Network (GIN) [98] takes a step further by pointing out that previous MPNN-based methods are incapable of distinguishing different graph structures based on the graph embedding they produce and adjusting the weight of the central node by a learnable parameter to amend this drawback. Attention-based GNN models can be categorized into the spatial-based type. GAT [99] incorporates the attention mechanism into the propagation step and further utilizes the multihead attention mechanism to stabilize the learning process, which is defined as follows: where is the concatenation operation, σ is the activation method, α k (·) is the k-th attention mechanism. Other than the convolution operation, the recurrent operation can also be applied in the propagation module of GNNs. The key difference is that the convolution operations use different weights while the recurrent operations share the same weights. For example, Gated Graph Sequence Neural Network (GGS-NN) [100] uses Gated Recurrent Units (GRU) in the propagation step. In realistic networks, the network topology may change occasionally, e.g., with the addition or deletion of routers, which corresponds to the case of dynamic graphs, instead of static graphs. Several GNN variants are proposed for dealing with dynamic graphs. Diffusion Convolutional Recurrent Neural Network (DCRNN) [101] leverages GNNs to collect the spatial information, which is further used in sequence-to-sequence models. By extending the static graph structure with temporal connections, Structural-RNN (S-RNN) [102] can learn the spatial and temporal information simultaneously. The last case to discuss is the heterogeneous graph, in which the nodes and edges are multi-typed or multi-modal. For this case, meta-path is introduced as a path scheme which determines the type of node in each position of the path, then one heterogeneous graph can be reduced to several homogeneous graphs to perform graph learning algorithms. To generate the final representation of nodes, graph attention is performed on the meta-path-based neighbors and a semantic attention is used over output embeddings of nodes under all meta-path schemes in Heterogeneous Graph Attention Network (HetGAT) [103] . Machine learning has emerged as a new paradigm to solve various networking problems and to automate network management [50] . Compared with traditional methods, ML models provide many benefits for solving the networking relevant problems. The first advantage is that machine learning models can automatically learn and improve from experiences without being explicitly programmed [50] . Even though it takes some efforts to train a machine learning model, the inference time when applying a trained model is much smaller. These efforts are also inevitable when applying a traditional method based on various optimization techniques which may require a long iteration update process. The second advantage is that the machine learning models are more effective in learning wide and dynamically changing data than statistical and heuristic methods. Based on these advantages, machine learning models, especially the deep learning models, have been widely applied in the networking domain. The story does not end here. While machine learning has achieved a great success in many research fields, e.g., computer vision, natural language processing and time series processing. Most of these fields use Euclidean domain data, for which the feed forward neural networks, CNNs and RNNs are enough. However, for other fields, e.g., chemistry and biology, these models are inade-quate for learning the non-Euclidean graph data, which contain rich relational information between each pair of neighboring elements. Many kinds of graph structure data also exist in the communication networks as introduced earlier, which is beyond the ability of non-GNN machine learning models. Driven by the graph structure data, GNNs are preferable because GNNs can automatically learn a condensed representation of each node in the network that incorporates the information about the node, its neighbors, and their inter-connecting topol- It is also proven that GNNs have a higher training efficiency than other neural The second concern is about the depth of GNN models. For other neural networks, e.g., CNNs, it has been proven effective to use a deeper structure, e.g., ResNet. However, similar benefits are not obvious for GNNs. It has been found that when using more than two GCN layers, the performance becomes worse with more GCN layers. This is because GNNs rely on the aggregation operation on the features of neighbor nodes, the results become too smooth and lack of differentiation after multiple layers. As the network continues to overlap, eventually all nodes will learn the same expression and GNNs fail to work. It is still questionable Whether the graph neural network needs a deep structure, or whether a deep network structure can be designed to avoid the problem of over-smoothness in the networking domain. The third concern is about the stability of GNN models, both under the stochastic perturbations and adversarial attacks [105, 106, 107] . Stochastic perturbations appear in the communication networks in the situations when link failure and congestion happen. While the adversarial attacks appear when targeted attacks on the underlying networks happen. These problems already exist for other neural networks and more attack types can be designed by leveraging the node features or the graph structure. It has been found that the stability of GNNs is affected by multiple factors, e.g., the graph filter, nonlinearity, architecture width and depth, etc [106] . And massive efforts have been put to design GNNs which are robust to the perturbations or attacks. With the deeper involvement of GNNs with various networking problems, more potential vulnerable cases would appear which would require a design of more robust GNNs. The last but not the least concern is about the explainability of GNNs for networking problems. The study for the explainability and visualization of deep learning models has a long story and deep learning has been criticized for its "black-box" property. The graph structure brings new challenges for the explainability problem. The development of post-processing techniques to explain the predictions made by GNNs has been made with some progresses, however, the explainability of GNNs in the networking domain has not yet been fully addressed [108, 109] . In this section, we focus on the relevant studies in wireless network scenarios. For wireless networks, we refer to those transmitting information through wireless data connections without using a cable, including wireless local area network, cellular network, wireless ad hoc network, cognitive radio network, device-to-device (D2D) network, satellite network, vehicular network, etc. Some problems are ubiquitous in different formats of wireless networks, e.g., power control. We would first talk about these problems in general wireless network scenarios. Then we discuss the papers focusing on a specific wireless network scenario. Compared with other deep learning models, GNNs have the advantage of handling the topology information, which may not be leveraged in previous studies with Euclidean deep learning models. In densely deployed wireless local area networks, the channel resource is limited. To increase the system throughput, an unsupervised approach to learn optimal power allocation decisions, a primaldual counterfactual optimization approach is proposed in [84] , in which GNNs are used to handle the network topology. To sum up, the papers in the general wireless network scenario are listed in Table 5 . The target problem, proposed solution and the relevant GNN component(s) are also listed. The similar tabular format for the paper summary applies in the following sections. Energy consumption is another concern for 5G network, which is designed to enable a denser network with microcells, femtocells and picocells. To better control the transmission power, GNN-based power control solutions are proposed in [66, 68] . Heterogeneous GNNs (HetGNNs) with a novel parameter sharing scheme are proposed for power control in multi-user multi-cell networks [66] . Take a step further, the joint optimization problem of user association and power control of the downlink is considered in [68] , in which an unsupervised GNN is used for power allocation and the Spectral Clustering algorithm is used for user association. Green network management is proposed to improve the energy efficiency. A specific problem, the Idle Time Windows (ITWs) prediction, is considered in [29] . To capture the spatio-temporal features, a novel Temporal Graph Convolutional Network (TGCN) is proposed for learning the network representation, which improves the prediction performance. Also for the denser cell sites, the Integrated Access and Backhaul (IAB) architecture defined by the 3rd Generation Partnership Project (3GPP) is used in [26] . The IAB topology design is formulated as a graph optimization problem and a combination of deep reinforcement learning and graph embedding is proposed for solving this problem efficiently. The integration of satellite-terrestrial networks is proposed for the future 6G network. In this direction, a High Altitude Platform Station (HAPS) is a network node that operates in the stratosphere at an altitude around 20 km and is instrumental for providing communication services [113] . For HAPS, GAT is firstly utilized for channel estimation in [114, 62] , and the proposed GAT estimator outperforms the traditional least square method in full-duplex channel estimation and is also robust to hardware imperfections and changes in small-scale fading characteristics. As a softwarized concept, network slicing has been proposed for 5G network, using network virtualization to divide single network connection into multiple distinct virtual connections that provide services with different Quality-of- and GCN are combined for solving this problem [79, 52] , in which the episodic Markov Decision Process is solved by different GCN models. To sum up, the papers in the cellular network scenario are listed in Table 6 . In this part, we discuss the other formats of wireless networks, with their own challenges and solutions. The first case is the cognitive radio network, which aims to increase the spectrum utilization by secondary users with an opportunistic use of the free spectrum that is not used by the primary users. In this scenario, the challenge is to improve the resource utilization, without degrading the quality of service (QoS) of primary users. To solve this challenge, a joint channel selection and power adaptation scheme is proposed in [46] , in which GCN is leveraged to extract the crucial interference features. Based on the estimated CSI, a DRLbased framework is further used to allocate spectrum resources efficiently. The second case is the Device-to-Device (D2D) network, which uses the di- [61, 43, 65] . Graph embedding based method is proposed in [61, 43] , in which the graph embedding process is based on the distances of both communication and interference links, without requiring the accurate CSI. The proposed method manages to reduce the computational complexity for the link scheduling problem significantly. The third case is the Internet of Things (IoT) network, which is designed for connecting smart devices, e.g., smart meters, smart light bulbs, connected valves and pumps, etc. The application of IoT networks covers a wide range, e.g., smart factory, smart agriculture, smart city, etc. The wide application also arises a great number of challenges, e.g., resource utilization efficiency, battery limitation for computation and communication, and security concerns. Some of these challenges can be solved with graph-based methods. One example is the channel estimation problem considered in [114] , in which Direct-to-satellite graph-based framework named SMART is proposed in [116] , in which GCN is combined with a deep Q-networks algorithm to capture the spatial and temporal patterns within a limited observation zone. Then the latency performance is re-constructed for the whole geographical area. To sum up, the papers in other wireless network scenarios are listed in Table 7 . For wired networks, we mainly refer to the computer networks that are connected with cables, such as laptop or desktop computers. A typical example is the Ethernet network. In this section, we first discuss the graph-based studies in the wired network scenario from five aspects, namely, network modeling, network configuration, network prediction, network management, and network security. Then three special cases are further discussed, i.e., blockchain platform, data center network, and optical network. GNNs are suitable for network modeling as the computer networks are often modeled as graphs. With the growing trend of contemporary Internet, it becomes more and more challenging to understand the overall network topology, the architecture and different elements of the networks, and their configurations. To solve this challenge, GNNs are proposed for network modeling. They are not only used to reconstruct the existing networks, but also used to model the non-existing networks, in order to provide an estimation of the unseen cases for network operators to make better network deployment decisions in the future. By modeling networks, the estimation of different end-to-end metrics are concerned in surveyed studies, given the input network topology, routing scheme and traffic matrices of the network, in a supervised [48, 78, 117, 45] or semi- supervised [71] way. Delay and jitter are considered in [48, 78, 117, 71] , while the throughput of TCP flows and the end-to-end latency of UDP flows are considered in [45] . Different GNNs are used for the network modeling purpose, including GGS-NN in [45] , MPNN in [48, 78] , GN and GNN in [117] , and GCN in [71] . GNNs are also used for network calculus analysis in [53, 38, 64] . Based on the modeling ability of GNNs, they are further proposed for network configuration feasibility analysis or decision. Based on the prediction of ensemble GNN model, different network configurations are evaluated in [76] , bound to the deadline constraints. Border Gateway Protocol (BGP) configuration synthesis is considered in [86] , which is the standard inter-domain routing protocol to exchange reachability information among Wide Area Networks (WANs). GNN is adopted to represent the network topology with partial network configuration in a system named DeepBGP, which is further validated for both Huawei and Cisco devices while fulfilling operator requirements. Another relevant study is to use GNN for Multiprotocol Label Switching (MPLS) configuration analysis. A GNN-based solution named DeepMPLS is proposed in [69] to speed up the analysis of network properties as well as to suggest configuration changes in case a network property is not satisfied. The GNN-based solution manages to achieve low execution times and high accuracies in real-world network topologies. GNNs can also be used for network prediction, e.g., delay prediction [27] and traffic prediction [47, 59, 118] . The better prediction is the basis of proactive management. A case study of delay prediction in queuing networks is conducted in [27] , which uses MPNN for topology representation and network operation. Several studies are concerned about data-driven traffic prediction, based on the real-world network traffic data and GNN-based solutions. A framework named Spatio-temporal Graph Convolutional Recurrent Network (SGCRN) is proposed in [47] , which combines GCN and GRU and is validated on the network traffic data from four real IP backbone networks. Another framework named Multiscale Spatial-temporal Graph Neural Network (MSTNN) is proposed for Origin-Destination Traffic Prediction (ODTP) and two real-world datasets are used for evaluation [59] . Inspired by the prediction model DCRNN [101] developed for road traffic, a nonautoregressive graph-based neural network is used in [118] for network traffic prediction and evaluated on the U.S. Department of Energy's dedicated science network. Network prediction results can be used further for network operation optimization and management [119] , e.g., traffic engineering, load balancing, routing, etc. For the time point of preparing this survey, routing is considered with graph-based deep learning models [85, 87] . Instead of using reinforcement learning, a novel semi-supervised architecture named Graph-Query Neural Network is proposed in [85] for shortest path and max-min routing. Another graph-based framework named NGR is proposed in [87] for shortest-path routing and load balancing. These graph-based routing solutions are validated with use-cases and show high accuracies and resilience to packet loss. Last but not the least, graph-based deep learning solutions are used for network security problems in computer networks [81, 24] . Automatic detection for Botnets, which is the source of DDoS attacks and spam, is considered in [81] . GNN is used to detect the patterns hidden in the botnet connections and is proven more effective than non-learning methods. Their dataset is also made available for future studies. In another study, intrusion detection is consid- To sum up, the papers in the wired network scenario are listed in Table 8 . Other than the general computer network case, three specific network cases are discussed with graph-based methods. The first case is the blockchain platform, which is well-known by the public thanks to Bitcoin, the most famous cryptocurrency. Generally speaking, the blockchain is a chain of blocks that store information with digital signatures in a decentralized and distributed network, which has a wide range of applications other than digital cryptocurrencies, e.g., financial and social services, risk management, healthcare facilities, etc [120] . is constructed as the representation of encrypted DApp flows as well as the input for GNNs. Real-world traffic datasets from 1,300 DApps with more than The last case is the optical network, which uses light signals, instead of electronic ones, to send information between two or more points. There are many unique problems when light signals are used for communication, e.g., wavelength assignment. The optimal resource allocation in a special network type, i.e., Free Space Optical (FSO) fronthaul network, is considered in [121] and GNNs are used for evaluating and choosing the resource allocation policy. The routing optimization for an Optical Transport Network (OTN) scenario is considered in [122] and the learning and generalization capabilities of GNNs are combined with DRL for routing in unseen network typologies. Similar to cellular and computer networks, traffic prediction is also considered in the optical network scenario [80] , with the solution combined by GCN and GRU. To sum up, the papers in other wired network scenarios are listed in Table 9 . SDN emerges as the most promising solution for bringing a revolution in how networks are built. Based on the white paper released by the Open Net- Optical Network Traffic Prediction [80] GCN-GRU GCN [95] working Foundation (ONF), the explosion of mobile devices and content, server and network planning [49, 30] . The decoupling of the control plane and data plane gives more computing power for routing optimization. Based on this observation, an intelligent routing strategy based on graph-aware neural networks is designed in [33], in which a novel graph-aware convolution structure is constructed to learn topological information efficiently. In another study for routing optimization, a GN-based solution is proposed for maximum bandwidth utilization, which achieves a satisfactory accuracy and a prediction time 150 times faster than Genetic Algorithm (GA) [70] . In SDN, network virtualization is a powerful way to efficient utilize the network infrastructure. Virtual Network Functions (VNFs) are virtualized network services running on physical resources. How to map VNFs into shared substrate networks has become a challenging problem in SDN, known as Virtual Network Embedding (VNE) or VNF placement, which is already proven to be NP-hard. To efficiently solve this problem, a bunch of heuristic algorithms are proposed in the literature. Recently, graph-based models have also been used for this problem [75, 39, 73, 44, 74, 50, 25] , which can get near-optimal solutions in a short time. To predict future resource requirements for VNFs, a GNN-based algorithm using the VNF forwarding graph topology information is proposed in [75, 39] . Deployed in a virtualized IP multimedia subsystem and tested with real VoIP traffic traces, the new algorithm achieves an average prediction accuracy of 90% and improves the call setup latency by over 29%, compared with the case without using GNNs. A parallelizable VNE solution based on spatial GNNs is proposed for accelerating the embedding process in [74] , which improves the revenue-to-cost ratio by about 18%, compared to other simulated algorithms. Similarly, GNN-based algorithms are proposed for VNF resource prediction and management in a series of studies [73, 44, 50] . On another aspect, DRL is often combined with GNNs for automatic virtual network embedding [54, 31, 60, 19 ]. Asynchronous DRL enhanced GNN is proposed in [54] for topology-aware VNF resource prediction in dynamic environments. An efficient algorithm combining DRL with GCN is proposed in [31] , with up to 39.6% and 70.6% improvement on acceptance ratio and average revenue, compared with the existing state-ofthe-art solutions. A more specific problem, i.e., traffic flow migration among different network function instances, is considered in [60, 19] , in which GNN is used for migration latency modeling and DRL is used for deploying dynamic and effective flow migration policies. Last but not the least, Service Function Chaining (SFC) is considered in several studies [124, 51, 72, 63] . SFC uses SDN's programmability to create a service chain of connected virtual network services, resulting in a service function path that provides an end-to-end chain and traffic steering through them. Graph-structured properties of network topology can be extracted by GNNs, which outperforms DNNs for SFC [51, 72] . However, most of the existing studies for SFC use a supervised learning approach, which may not be suitable for dynamic VNF resources, various requests, and changes of topologies. To solve this problem, DRL is applied for training models on various network topologies with unlabeled data in [124] and achieves remarkable flexibility in new topologies without re-designing and re-training, while preserving a similar level of performance compared to the supervised learning method. DRL is also used for adaptive SFC placement to maximize the long-term average revenue [63] . To sum up, the papers in the SDN scenario are listed in Table 10 . In this section, we discuss some future directions for graph-based deep learn- The first research direction is the combination of GNNs and other artificial intelligence techniques. Some examples are already seen in this survey, e.g., the combination of GNN and GRU for traffic prediction [77, 80] , the combination of GNN and DRL for resource allocation [46] , routing [122] , and VNE [31] . The advantages of GNNs include its learning ability for topological dependencies and the generalization capability for unseen network typologies, but GNN is not a panacea. For example, for some cases which is lack of training data or is too expensive to collect real data, Generative Adversarial Nets (GANs) [125] is a possible solution. Even though GANs have been widely used in other fields, e.g., image and video, the combination of GANs and GNNs [126] has not been applied for communication networks, at least in the scope of this survey. Another example is the Automated Machine Learning (AutoML) technique [127] , which can be used for optimizing the GNN parameters automatically. Another research direction is to apply graph-based deep learning on larger networks. In most of the surveyed studies, the network topology is small, e.g., less than 100 nodes, compared with contemporary networks. However, the modeling of larger networks would require huge computation requirements. Graph partitioning and parallel computing infrastructures are two possible solutions for this problem. A larger network may be decomposed into smaller ones that is within the computing capacity. However, the optimal divide-and-conquer approach remains unknown and may vary in different network scenarios. Another concern is that whether it is worthy of achieving narrow performance margins in the cost of the increased computation burden caused by graph-based models, compared with traditional methods. In this paper, a survey is presented for the application of graph-based deep learning in communication networks. The relevant studies are organized in three network scenarios, namely, wireless networks, wired networks, and software de- [41] Y. Yan, B. Zhang, C. Li, C. Su, Cooperative caching and fetching in d2d communications-a fully decentralized multi-agent reinforcement learn- Wireless networks design in the era of deep learning: Model-based, ai-based, or both? Deep learning in mobile and wireless networking: A survey Thirty years of machine learning: The road to pareto-optimal wireless networks Deep learning for network traffic monitoring and analysis (ntma): A survey A comprehensive survey on graph neural networks, IEEE Transactions on Neural Networks and Learning Systems Graph neural networks: A review of methods and applications Deep learning on graphs: A survey Graph learning: A survey Graph neural networks: Architectures, stability, and transferability Graph neural network for traffic forecasting: A survey Combining deep reinforcement learning with graph neural networks for optimal vnf placement Iab topology design: A graph embedding and deep reinforcement learning approach Message-passing neural networks learn little's law On dynamic service function chain reconfiguration in iot networks Idle time window prediction in cellular networks with deep spatiotemporal modeling Routenet: Leveraging graph neural networks for network modeling and optimization in sdn Automatic virtual network embedding: A deep reinforcement learning approach with graph convolutional networks Graph neural networks for scalable radio resource management: Architecture design and theoretical analysis Unfolding wmmse using graph neural networks for efficient power allocation Graph embedding based wireless link scheduling with few training samples Graph neural network-based virtual network function deployment optimization Deepcomnet: Performance evaluation of network topologies using graph-based deep learning A graph convolutional network-based deep reinforcement learning approach for resource allocation in a cognitive radio network Spatiotemporal graph convolutional recurrent networks for traffic matrix prediction Challenging the generalization capabilities of graph neural networks for network modeling Unveiling the potential of graph neural networks for network modeling and optimization in sdn Graph neural network-based virtual network function management Graph neural network based service function chaining for automatic network control Learn to improve: A novel deep reinforcement learning approach for beyond 5g network slicing Deeptma: Predicting effective contention models for network calculus using graph neural networks Deep reinforcement learning for topologyaware vnf resource prediction in nfv environments Resource allocation based on graph neural networks in vehicular communications Graph attention spatialtemporal network for deep learning based mobile traffic prediction Efficient power allocation using graph neural networks and deep algorithm unfolding Transferable policies for large scale wireless networks with graph neural networks, in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing Mstnn: A graph learning based method for the origin-destination traffic prediction IEEE International Conference on Communications (ICC) Deepmigration: Flow migration for nfv with graph-based deep reinforcement learning Wireless link scheduling for d2d communications with graph embedding technique Channel estimation for full-duplex ris-assisted haps backhauling with graph attention networks Drl-sfcp: Adaptive service function chains placement with deep reinforcement learning On the robustness of deep learning-predicted contention models for network calculus Wireless d2d network link scheduling based on graph embedding Learning power control for cellular systems with heterogeneous graph neural network Graph attention network-based drl for network slicing management in dense cellular networks User association and power allocation based on unsupervised graph model in ultra-dense network Deepmpls: fast analysis of mpls configurations using deep learning Network routing optimization based on machine learning using graph networks robust against topology change On estimating communication delays using graph convolutional networks with semi-supervised learning Service function chaining and traffic steering in sdn using graph neural network Graph neural network-based virtual network function deployment prediction Accelerating virtual network embedding with graph neural networks A connectionist approach to dynamic resource management for virtualised network functions Improvements to deep-learning-based feasibility prediction of switched ethernet network configurations A noval satellite network traffic prediction method based on gcn-gru Towards more realistic network models based on graph neural networks On the use of graph neural networks for virtual network embedding Optical network traffic prediction based on graph convolutional neural networks, in: 2020 Opto-Electronics and Communications Conference (OECC) Automating botnet detection with graph neural networks, in: AutoML for Networking and Systems Workshop of MLSys 2020 Conference A graph neural network approach for scalable wireless power control Large scale wireless power allocation with graph neural networks Wireless power control via counterfactual optimization of graph neural networks Learning and generating distributed routing protocols using graph-based deep learning Deepbgp: A machine learning approach for bgp configuration synthesis Proceedings of the Workshop on Network Meets AI & ML, 2020 Scalable power control/beamforming in heterogeneous wireless networks with graph neural networks The graph neural network model Discriminative embeddings of latent variable models for structured data Inductive representation learning on large graphs Learning graph embedding with adversarial training methods Deep convolutional networks on graphstructured data Convolutional neural networks on graphs with fast localized spectral filtering Semi-supervised classification with graph convolutional networks Neural message passing for quantum chemistry Relational inductive biases, deep learning, and graph networks How powerful are graph neural networks? Graph attention networks Gated graph sequence neural networks Diffusion convolutional recurrent neural network: Data-driven traffic forecasting Structural-rnn: Deep learning on spatio-temporal graphs Heterogeneous graph attention network How neural architectures affect deep learning for communication networks? Adversarial attacks on neural networks for graph data Stability properties of graph neural networks Convergence and stability of graph convolutional networks on large random graphs Explainability methods for graph convolutional neural networks Gnnexplainer: Generating explanations for graph neural networks Topology aware deep learning for wireless network optimization Decentralized inference with graph neural networks in wireless communication systems Fast power control adaptation via meta-learning for random edge graph neural networks A vision and framework for the high altitude platform station (haps) networks of the future Graph attention networks for channel estimation in ris-assisted satellite iot communications A graph neural network based intrusion detection system Spatio-temporal modeling for large-scale vehicular networks using graph convolutional networks Applying graph-based deep learning to realistic network scenarios Dynamic graph neural network for traffic forecasting in wide area networks Traffic prediction for dynamic traffic engineering A survey of blockchain from the perspectives of applications, challenges, and opportunities Resource allocation via graph neural networks in free space optical fronthaul networks Deep reinforcement learning meets graph neural networks: Exploring a routing optimization use case Are we ready for sdn? implementation challenges for software-defined networks Reinforcement learning of graph neural networks for service function chaining Generative adversarial nets Graphgan: Graph representation learning with generative adversarial nets Automl: A survey of the state-of-the-art, Knowledge-Based Systems A comprehensive survey of graph embedding: Problems, techniques, and applications Joint user association and power allocation in heterogeneous ultra dense network via semi-supervised representation learning Unsupervised learning for asynchronous resource allocation in ad-hoc wireless networks IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Learning decentralized wireless resource allocations with graph neural networks A gnn based supervised learning framework for resource allocation in wireless iot networks Distributed scheduling using graph neural networks Gblinks: Gnnbased beam selection and link activation for ultra-dense d2d mmwave networks Graph neural network based access point selection for cell-free massive mimo systems A drl-based multiagent cooperative control framework for cav networks: a graphic convolution q network Cellular network traffic prediction incorporating handover: A graph convolutional approach Annual IEEE International Conference on Sensing, Communication, and Networking (SECON) Graph neural network for large-scale network localization Atari: A graph convolutional neural network approach for performance prediction in nextgeneration wlans Routing in small satellite networks: A gnn-based learning approach