key: cord-0532784-sz25d9kd
authors: Dai, Enyan; Zhao, Tianxiang; Zhu, Huaisheng; Xu, Junjie; Guo, Zhimeng; Liu, Hui; Tang, Jiliang; Wang, Suhang
title: A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability
date: 2022-04-18
journal: nan
DOI: nan
sha: c1c9b280fd657351d6718e4a0194b41c85832002
doc_id: 532784
cord_uid: sz25d9kd

Graph Neural Networks (GNNs) have made rapid developments in the recent years. Due to their great ability in modeling graph-structured data, GNNs are vastly used in various applications, including high-stakes scenarios such as financial analysis, traffic predictions, and drug discovery. Despite their great potential in benefiting humans in the real world, recent study shows that GNNs can leak private information, are vulnerable to adversarial attacks, can inherit and magnify societal bias from training data and lack interpretability, which have risk of causing unintentional harm to the users and society. For example, existing works demonstrate that attackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph. GNNs trained on social networks may embed the discrimination in their decision process, strengthening the undesirable societal bias. Consequently, trustworthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users' trust in GNNs. In this paper, we give a comprehensive survey of GNNs in the computational aspects of privacy, robustness, fairness, and explainability. For each aspect, we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs. We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthiness.

. The ethical principles of trustworthy AI. on independent and identically distributed (i.i.d) data such as images, Graph Neural Networks (GNNs) [25, 96, 204, 238] are investigated to generalize deep neural networks to model graphstructured data. GNNs have shown great performance for various applications across various domains including finance [70, 120] , healthcare [108] and social analysis [54, 175] . The success of GNNs relies on the message-passing mechanism, where node representations are updated by concerns in robustness, privacy, fairness, and explainability. In this survey, we also have some discussions about the interactions of the trustworthiness aspects in the future directions.

Due to the demand for trustworthy GNNs, a large number of literature in different aspects of trustworthy GNNs are emerging in recent years. For example, robust GNN against the perturbations from attackers have been developed [34, 88, 201] . To prevent the private information, Privacypreserving GNN models [113, 200] are also proposed in various real-world applications such as financial analysis. Fair GNNs [35] and explainable GNNs [36, 216] also become hot topics to address the concerns in trustworthiness. There are several surveys of GNNs in robustness [87, 171, 197, 243] and explainability [221] . However, none of them thoroughly discuss about the trustworthiness of GNNs, which should also cover the dimensions of privacy and fairness. For the aspects of robustness and explainability, they also do not include the emerging directions and techniques such as scalable attacks, backdoor attacks, and self-explainable GNNs, which are discussed in this survey. A recent survey [114] gives a review about the trustworthy AI systems. But it mainly focus on the techniques of trustworthy AI systems on i.i.d data. Considering the complexity of the graph topology and the deployment of message-passing mechanism in GNNs, the trustworthy AI designed for i.i.d data generally can not be adopted to process graph-structured data. Therefore, in this paper, we give a comprehensive survey about existing trustworthy GNNs in the computational perspectives of privacy, robustness, fairness and explainability. To summarize, our major contributions are: • In Section 3, we give a comprehensive survey of the existing works in privacy attacks and defense on GNNs followed by the future directions. The graph datasets in privacy domain are also listed. • Various categories of adversarial attack and defense methods on GNN models are discussed in Section 4. Some recent advances about the robustness of GNN such as scalable attacks, graph backdoor attacks, and self-supervised learning defense methods are further introduced. • The fairness of trustworthy GNNs is thoroughly discussed in the Section 5 of this survey, which includes the the biases and fairness definitions on graph-structured data, vairous fair GNN models and the datasets they applied. • A comprehensive survey of GNN explainability is presented in Section 6, in which we go through the motivations, challenges, and experiment settings adopted by existing works. A taxonomic summary of methodologies is also introduced.

To facilitate the discussion of trustworthy GNNs, we firstly introduce notations, the basic design of GNNs, and graph analysis tasks in this section.

We use G = (V, E) to denote a graph, where V = { 1 , ..., } is the set of nodes, E ⊆ V × V is the set of edges. The graph can be either attributed or plain graph. For attributed graph, node attributes X = {x 1 , ..., x } are provided, where x ∈ R corresponds to the -dimensional attributes of node . A ∈ R × is the adjacency matrix of the graph G, where A = 1 if nodes and are connected; otherwise, A = 0.

Apart from the node features, the graph topology also offers crucial important information for representation learning. Generally, GNNs adopt the message-passing mechanism to learn node representations that capture both node features and graph topology information. Specifically, in each layer, GNNs will update the representations of a node by aggregating the information from their neighborhood nodes. As a result, a -layers GNN model would capture the information of the local graph containing -hop neighbors of the central nodes. The general form of updating representations in the -th layer of GNNs can be formulated as:

, AGGREGATE ( −1) ({h ( −1) : ∈ N ( )})),

where h ( ) stands for the representation of node ∈ V after the -th GNN layer and N ( ) denotes the set of neighborhoods of node . For node classification, a linear classifier can be applied to the representation h to predict the label of node . For graph classification, a READOUT function will summarize the node embeddings to a graph embedding h for future predictions:

where READOUT can be various graph pooling functions such as max pooling and average pooling. Similar to node classification, graph classification can be conducted by applying a linear classifier on the graph embedding h . Extensive graph neural networks that follow the Eq.(1) have been proposed. Here, we only introduce the design of GCN [96] , which is one the most popular GNN architectures. For the design of other GNNs, please refer to the survey of GNN models [241] . More specifically, each layer of GCN can be written as:

where H ( ) denote the representations of all the nodes after the -th layer; W ( ) stands for the parameters of the -th layer.Ã is the normalized adjacency matrix. Generally, the symmetric normalized form is used, which can be written asÃ = D − 1 2 (A + I)D − 1 2 , and D is a diagonal matrix with = . I is the identity matrix. is the activation function such as ReLU.

The learned node representations by GNNs can facilitate various tasks, such as node classification, link prediction, community detection and graph classification. Next, we briefly introduce them. Node Classification. Many real-world problems can be treated as the node classification problem such as such as user attribute prediction in social media [35, 237] , fraud detection in transaction networks [189, 207] , and protein function prediction on protein-protein interaction networks [212] . In node-level classification, the GNN model aims to infer the labels of the test nodes V given the graph G = (V, E). Generally, the node classification task is semi-supervised where only partial nodes V ∈ V of the are provided with labels Y. Based on whether the test samples V are seen during the training phase, node classification can be split into transductive setting and inductive setting. In transductive setting, the test nodes V are available during the training phase. The node features of the test nodes can be utilized for better prediction performance. In contrast, in inductive setting the test nodes are totally new for the trained GNN model. Community Detection. Communities are subgraphs of a network, which are more densely connected to each other than the rest nodes in the network. Formally, the set of communities in the network can be represented by {C 1 , . . . C }, where C ⊂ G is a partition of the whole graph G. These communities could be either disjointed or overlapping. The goal of the community detection is to identity which communities each node ∈ G belongs to. Community detection is often unsupervised [160, 193, 229] . Recently, supervised community detection based on GNNs are also investigated [32] . Community detection can be useful for various domains such as social network analysis [64] and functional region identification in brain [61] . Link Prediction. Link Given a graph G = (V, E), the link prediction model will predict the existence of link between nodes ∈ V and ∈ V, where ( , ) ∉ E. A very common form of link prediction is to give prediction based on the representations of two nodes from a GNN model. Let h and h denote the representations of node and , it can be formulated as ( , ) = (h , h ). Link prediction have various applications such as friend recommendation on social media [153] and knowledge graph completion [10] . Graph Classification. For graph classification, each graph instance belongs to a certain class. The training set of the graph classification can be denoted as D = {(G , )} | D | =1 , where denotes the label of graph G and |D | represents the number of graphs in the training set. The goal of the graph classification is to learn a function : G → to classify the unlabeled test graphs D . As it is mentioned in Section 2.2, a READOUT function is added to the GNN model to obtain graph embedding for classification. Similarly, there are many applications of graph classification such as property prediction of drugs [83] where each drug is represented as a graph.

Similar to deep learning algorithms on images and texts, the remarkable achievements of GNNs also rely on the big data. Extensive sensitive data are collected from users to obtain powerful GNN models for various services in critical domains such as healthcare [109] , banking systems [189] , and bioinformatics [108] . For example, GNNs have been applied on brain networks for FMRI analysis [108] . In addition, the GNN model owner may provide the query API service to share the knowledge learned by GNNs. It is also very common that the pretrained GNN models are released to third parties for knowledge distillation or various downstream tasks [117] . However, the collection and utilization of private data for GNN model training, the API service and model release are threatening the safety of the private and sensitive information. First, GNNs are generally trained in a centralized way, where the users' data and models are stored in the centralized server. In case of an untrustworthy centralized server, the collected sensitive attributes might be leaked by unauthorized usage or data breach. For instance, the personal data of more than half a billion Facebook users was leaked online for free in a hacker forum in 2021 1 . Second, the private information of users can also be leaked from model release or the provided services due to privacy attacks. Taking the online service for brain disease classification as example, membership inference attack can figure out the patients that covered in the training dataset, which severely threaten the privacy of patients. Moreover, various types of privacy attacks [46, 232] such as link inference and attribute inference have been proved effective to steal users' information from the pretrained model. Therefore, it is crucial to develop privacy-preserving GNNs to achieve trustworthiness.

There are several surveys on the privacy aspect of machine learning model. In [149] , it comprehensively reviews the current privacy attacks in i.i.d data. Privacy-preserving methods are reviewed in [6, 82, 90, 213] as well. However, they are overwhelmingly dedicated to the privacy issue on models for i.i.d data such as images and text, and rarely discuss the privacy attacks and defense methods on graphs; while these methods are challenged by the topology information in graphs and the message-passing mechanism of GNN models. Therefore, in this section, we give the overview of the privacy attacks on GNNs and privacy-preserving GNNs to defend against privacy attacks. We also include related datasets and the applications of privacy-preserving GNNs followed by future research directions on privacy-preserving GNNs.

In this subsection, we will introduce the categorization of privacy attacks on GNNs according to the target private information. We will also briefly explain two settings of the attacker's accessible knowledge for conducting privacy attacks. Finally, we will present more details of existing privacy attack methods on GNNs. Table 1 . Different types of privacy attack methods on GNNs.

Membership inference [46] , [137] , [75] , [199] Property Inference [232] Reconstruction attack [74] , [232] , [46] , [234] Model extraction [198] , [162] Table 2. Categorization of attacker's knowledge.

White-box [46] , [234] Black-box [46] , [137] , [75] , [199] , [198] , [162] , [232] , [74] 3.1.1 Types of Privacy Attacks on GNNs. The goal of privacy attacks on GNNs is to extract information that is not intended to be shared. The target information can be about the training graph such as the membership, sensitive attributes of nodes, and connections of nodes. In addition, some attackers aim to extract the model parameters of GNNs. Based on the target knowledge, the privacy attacks can generally be split into four categories:

• Membership Inference Attack: In membership inference attack, the attackers try to determine whether a target sample is part of the training set. For example, suppose researchers train a GNN model on social network of COVID-19 patients to analyze the propagation of virus. The membership inference attack can identify if a target subject is in training patient network, resulting in information leakage of the subject. Different from i.i.d data, the format of the target samples can be nodes or graphs. For instance, for node classification task, the target samples can be subgraph of the target node's local graph [137] or only contain the node attributes [75] . For graph classification task, the target sample is a graph to be classified [199] . • Reconstruction Attack: Reconstruction attack, also known as model inversion attack, aims to infer the private information of the input graph. Since the graph-structure data is composed of graph topology and node attributes, the reconstruction attack on GNNs can be split into structure reconstruction, i.e., infer the structures of target samples, and attribute reconstruction (also known as attribute inference attack), i.e., infer the attributes of target samples. Generally, the embeddings of the target samples are required to conduct the reconstruction attack. • Property Inference Attack: Different from attribute reconstruction attack, property inference attack aims to infer dataset properties that are not encoded as features. For instance, one may want to infer the ratio of women and men in a social network, where this information is not contained in node attributes. The attacker may also be interested in structure-related properties such as degrees of a node, which is the number of friends of the target user in a social network [232] . • Model Extraction Attack: This attack aims to extract the target model information by learning a model that behaves similarly to the target model. It may focus on different aspects of the model information, which results in two goals in model extraction: (i) The attacker aims to obtain a model that matches the accuracy of the target model; (ii) The attacker tries to replicate the decision boundary of the target model. Model Extraction Attack can threaten the security of model for API service [135] and can be a stepping stone for various privacy attacks and adversarial attacks. Table 1 categorizes existing methods on privacy attacks on GNNs based on the attack types. We will introduce the details of these methods in Section 3.2.

To conduct privacy attacks, auxiliary knowledge about the target GNN and/or dataset is usually possessed by attackers. In this subsection, we introduce the categorization of threat models of privacy attacks in the aspect of attackers' knowledge. Generally, based on whether the model parameters of the target GNN are available, attacker's knowledge about the threat model can be split into two settings, i.e., white-box attack and black-box attack:

• White-Box Attack: In white-box attack, the model parameters or the gradients during training is accessible for the attackers. Apart from the knowledge about the trained GNNs, the attacks may require some other knowledge such as the nodes/graphs to be attacked in inference attacks and a shadow dataset, i.e., dataset that follows the same distribution as the training dataset of the target GNN. The white-box attack can be used to attack pretrained GNNs whose model is publicly releasedg [151] . It is also practical during the training process of federated learning [46] where the intermediate computations.

• Black-Box Attack: In contrast to white-box attack, the parameters of the target GNN are unknown in black-box attack; while the architecture of the target GNN and hyperparameters during training may be known. In this setting, attackers are generally allowed to query the target GNN model to get the prediction vectors or embeddings of the queried samples. Similar to the white-box attacks, shadow datasets and the target nodes/graphs are also required to conduct black-box attacks. A practical example of the black-box privacy attack is to attack the API service that sends the output of the GNN models when receiving queries from the users.

The categorization of existing privacy attack methods according to the assumption on attacker's knowledge is shown in Table 2 .

The Unified Framework. Supervised privacy attack is a common design strategy of privacy attacks [46, 74, 75, 137, 162, 198, 199, 232] . The core idea of supervised privacy attack methods is to utilize the shadow dataset and the output of the target models to get the supervision for training a privacy attack mode, which can be described as a unified framework shown in Figure 2a . As shown in Figure 2a , the attacker uses a shadow dataset D as input to the target model to obtain the predictions or embeddings. Then, ground-truth of various types of privacy attacks on the shadow dataset can be attained. With the attack labels from the shadow dataset, attackers can train an attack model that performs inference based on the outputs of the target models by:

where G indicates the samples from the shadow dataset, which can be subgraphs of node 's local graph for node classification or a sample graph for graph classification. represents the extracted attack labels of the sample G . As illustrated in Fig. 2a , it can be varied from attributes to network properties for different privacy attacks. (·) denotes the loss function such as cross entropy loss to train the attack model . denotes the parameters of the attack model . After the attack model is trained, privacy attack on the target example G can be conducted by ( (G )). Next, we will give more details about the methods of each types of privacy attacks. Membership Inference Attack. The membership inference attack aims to identify if a target sample was used for training the target model . The privacy leakage of membership is caused by the overfitting of the model on the training dataset, which leads to the prediction vectors (predicted label distributions) of training and test dataset follow different distributions [163, 199] . Thus, an attacker can utilize the prediction vector to judge if a data instance was in the training set for . To learn an membership inference attack model, the most common way is to apply shadow training to obtain supervision for membership inference and train an attack model. The process of shadow training is shown in Fig. 2b . In shadow training, part of the shadow dataset D are used to train a surrogate model to mimic the behaviors of the target model as:

where G is a graph for graph classification or -hop subgraph centered at node for node classification. (G ) denotes the predicted label distribution of G and (·) is loss function such as cross entropy loss to ensure (G ) is similar to (G ). Since G ∈ D is used for training , the predictive probability vectors of G from , i.e., [ (G ), ], can be labeled as positive, where is the ground-truth class label of G , used to help the attack model to judge if (G ) is overfitting. Similarly, the probability vectors of other shadow samples D = D − D can be labeled as negative samples for membership inference attack. Then, the prediction vectors and the obtained labels for membership inference will be used to train an attackers model such as logistic regression, which can infer whether a sample is in the training dataset of the target model or not.

Membership inference attack has already been extensively investigated on i.i.d data [81, 145, 163] . Due to the great success of GNNs, recently, membership inference attack on graphs has raised increasing attention [46, 75, 137, 199] , which generally follow the same scheme of shadow training. In [137] , membership inference based on subgraphs of the target node's local graph is investigated for node classification task. More specifically, the shadow dataset D in [137] is a graph that comes from the same underlying distribution as the graph used for training. A GCN is deployed as the shadow model . In [75] , the authors further investigate the attack on target samples that are only provided with node features. To infer the membership of a singe test node, the attacker model is

where (x ) denotes the prediction of when only attributes of a single node is feed into the surrogate GNN. In [199] , the authors also adopt the introduced framework. The major difference is that they achieve the membership inference attack on graph classification task. Reconstruction Attack. As mentioned in Sec. 3.1.1, reconstruction attack aims to infer the sensitive attributes or/and links in target datasets. Due to the message passing of GNNs, the learned node/graph embeddings capture both node attributes and graph structure information. Thus, existing reconstruction attack method [46] reconstructs the information from node embeddings H = [h 1 , . . . , h ] learned by the target GNN. For attribute reconstruction, can simply be a multilayer perceptron (MLP) and reconstruct the attributes asX = (H). For link inference, generally predicts the link based on the embeddings of node and by ( , ) = (h , h ). Following the unified framework, a shadow graph G is used to provide the adjacency matrix A and sensitive attributes X as supervision. The node embeddings H of the shadow graph are assumed to be available. Then, the attack model can be trained in a supervised manner. For attribute reconstruction attack, the training loss is given as min ∈ G (x ,x ), where could be MSE loss for continuous attributes and cross entropy loss for categorical attributes. As for the link inference, the objective function is: min ∥Â − A ∥ 2 , whereÂ is the adjacency matrix reconstruted by . Similar to link prediction, negative sampling may also be applied here. Reconstructing adjacency matrix on graph embeddings are also investigated in [232] . The main difference between [46] is that the attacker model directly infers the adjacency matrix with the graph embedding byÂ = (h ), where h denotes a graph embedding. For some applications, the embeddings might not be available while the prediction vector of an instance can be obtained by querying the target model. Therefore, He et al. [74] propose to infer the link of two nodes with their prediction vectors as ( , ) = (y , y ), where y is the prediction vector of node from target model . Following shadow training, the authors firstly query the prediction vectors of shadow dataset from the target model . Then, a surrogate model is trained on the shadow dataset and queried prediction vectors. Finally, the link inference attacker can be learned with the supervision generated from the surrogate model.

The aforementioned methods all focus on black-box settings. However, with the development of pretraining GNN and federated learning that share the model parameters, white-box reconstruction attack methods also start to attracting attentions. For example, GraphMI [234] proposes to reconstruct the adjacency matrix in a white-box setting where the trained model parameters are known. Intuitively, the reconstructed adjacency matrix A will be similar to the original adjacency matrix if the loss between true node label and predictions using the reconstructed matrix, (X, X) is minimized. In addition, the graph structure is updated to ensure accurate predictions from GNN model under the feature smoothness constraint. Formally, GraphMI aims to solve the following optimization problem: (6) where L = D − A is the Laplacian matrix of A and D is the diagonal matrix of A. ∥A∥ 1 is the ℓ 1 -norm to make A sparse. and control the contributions of feature smoothness constraint and sparsity constraint on the adjacency matrix A. Property Inference Attack. The property inference attack is still in an early stage. An initial effort is taken in [232] to infer the properties of a target graph by its embedding. The proposed method also follows the unified framework. Let denote the property of a graph G in the shadow dataset D , the attack model is trained by min

can be MSE loss or cross entropy loss for different types of properties. Model Extraction Attack: The model extraction attack aims to learn a surrogate model that behaves similarly to the target model. The process of training a surrogate model is also included in the membership inference attack, which is shown in Fig. 2b . Generally, the attacker will first query the target model to obtain predictions on the shadow dataset. It then leverages the shadow dataset and the corresponding predictions to train the surrogate model for model extraction attack, which has been formulated in Eq. (5) . Following this framework, recent works [162, 198] have investigated the GNN model extraction attacks with different levels of knowledge on shadow dataset and training graph of the target model. For example, the general framework of model extraction is applied for training on the shadow dataset that provided with graph structures [162, 198] . If no structure is given in the shadow dataset, the missing graph structures can be firstly learned. For example, in [162] , the graph structure is firstly initialized by KNN on the node attributes then updated by a graph structure learning framework [31] . [136] , [152] , [208] [228]

Federated Learning

Membership Inference [200] , [240] , [115] - [73] , [142] , [205] , [185] , [239] Adversarial Privacy-Preserving Attribute Reconstruction [106] , [113] Structure Reconstruction [183] 3.3 Privacy-Preserving Graph Neural Networks

As GNNs are vulnerable to privacy attacks and may leak the private information of users, privacypreserving GNNs are developed to protect the privacy. Current privacy-preserving GNNs generally fall into three categories, i.e., differential privacy, federated learning, and adversarial privacypreserving. The categorization of the privacy-preserving GNNs are listed in Table 3 . Next, we will introduce representative and state-of-the-art methods in each category.

3.3.1 Differential Privacy for Privacy-Preserving GNNs. Differential Privacy (DP) [48] is a popular approach that can provide privacy guarantee of training data. The core idea of differential privacy is that that if two datasets differ only by one record and are used by the same algorithm, the outputs of the algorithm on the two datasets should be similar. With differential privacy, the impact of a single sample is strictly controlled. Thus, the membership inference attack can be defensed by DP with theoretical guarantee. Formally, differential privacy is defined as follows:

Definition 3.1 (( , )-Differential Privacy [48] ). Given > 0 and ≥ 0, a randomized mechanism M satisfies ( , ) differential privacy, if for any adjacent datasets and ′ ∈ R and for any subsets of outputs S, the following equation is met:

where is the privacy budget to trade-off the utility and privacy. A larger will lead to stronger privacy guarantee but weaker utility. When = 0, it is equivalent to -Differential Privacy. ( , )-DP allows for the possibility that plain -DP is broken with a small probability .

To achieve ( , )-Differential Privacy, some additive noise mechanisms such as Gaussian mechanism [49] and Laplace mechanism [48] are widely adopted. Based on the privacy budget and the mechanism to be protected, certain levels of Gaussian noise or Laplace noise will be injected to achieve a differentially private mechanism. Recently, various differential-privacy preserving deep learning methods [1, 8, 141, 152] are proposed to protect the training data privacy. For instance, NoisySGD [1] adds noises to the gradients during model training so the trained model parameters will not leak training data with certain guarantee. PATE [141] firstly trains an ensemble of teacher models on subsets of sensitive training data that are split disjointly. Then, the student model will be trained with the aggregated output of the ensemble on public data. As a result, the student model is unlikely to be affected by a change of a single sensitive data, which meets the differential privacy. Theoretical guarantee of PATE in privacy is also analyzed in [141] . In differential privacy, a trusted curator will be required to apply calibrated noise to produce DP. To handle the situation of untrusted curator, local differential privacy methods [8, 152] that perturbs users' data locally before uploading to the central server are also investigated for privacy protection.

To protect the privacy of graph-structured data, many differential privacy preserving network embedding methods are investigated [136, 152, 208, 228] . For instance, DPNE [208] applies perturbations on the objective function of learning network embeddings. In [227] , a perturbed gradient descent method that guarantees the privacy of graph embeddings learned by matrix factorization is proposed. More recently, several works [136, 152] that focus on differentially private GNNs are explored. For example, the locally private GNN [152] adopts local differential privacy [92] to protect the privacy of node features by perturbing the user's features locally. Furthermore, a robust graph convolution layer is investigated to reduce the negative effects of the injected noises. PrivGnn [136] extends the Private Aggregation of Teacher Ensembles (PATE) [141] for graph-structured data to release GNNs with differential privacy guarantees. In particularly, random subsampling on the training set of teacher and noisy labeling mechanism on the public data are used in PrivGNN [136] to achieve practical privacy guarantees. As for [228] , both the user feature perturbation at input stage and loss perturbation at optimization stage are investigated to achieve a privacy-preserving GNN for recommendation. In addition, they also empirically show that their proposed method can help defend against the attribute inference attack as the noises added to node features can prevent the attacker from reconstructing the original sensitive attribute.

Learning for Privacy-Preserving GNNs. Currently, the majority of deep learning methods require a centralized storage of user data for training. However, this can be unrealistic due to the privacy issue. For example, when several companies or hospitals want to combine their data to train a GNN model, the data of their users are not allowed to be shared according to the privacy terms. Furthermore, the users may be not willing to upload their data to the platform server due to the concern of information leakage. For such situations, the data will remain in the local devices of users or the data holder organizations. To address this problem, federated learning [98, 127] is proposed to collectively learn models with decentralized user data in a privacy-preserving manner. In particular, it aims to optimize the following objective function:

where is the total number of devices/clients, D is the local dataset stored in the -th client, and L is the local objective function for the -th device. The impact of each device is controlled by with ≥ 0 and = 1. The is often set as 1 or | D | | D | , where |D | = |D | is the total size of samples. A general framework of federated learning to solve Eq.(8) is given in Fig. 3 , where the user data and a local model ( ) are maintained locally in -th client in federated learning. In the training step , each client will compute the local model updates based on their own data. Then, the central server will aggregate the model updates from clients and update the global model parameters to +1 . The updated global model will be distributed to the clients for future iterations. The federated learning framework is firstly proposed in FedAvg [127] , which is the most commonly used federated learning algorithm now. specifically, FedAvg aggregates the model parameters by averaging the updated model parameters from clients as +1 = =1

denotes the updated parameters in -th client at step . More comprehensive survey about federated machine learning can be found in [214] .

To protect the user privacy, federated learning has been extended to train GNNs [73, 115, 142, 185, 200, 205, 239, 240] . To handle the challenges caused by non-i.i.d. graphs, Xie et al. [205] propose to dynamically cluster local systems based on GNN gradients, which can reduce the structure and feature heterogeneity among graphs. In [185] , the work proposes a hybrid of meta learning and federated framework. They view the training on a client as a task in meta-learning and learn a global model to mitigate the issue of non-i.i.d data. Then, federated learning methods are leveraged to further update the global model. In [240] , vertically federated learning that assumes that different clients hold different features and neighborhood information of the user data is proposed. To protect the private data, i.e., node attributes, edges and labels, the computations on private data are carried out by the data-holders. A semi-honest server focuses on computations on encoded node representations that are non-private. Apart from federated learning GNNs in node/graph classification, federated frameworks on GNNs for recommendation are also developed [115, 200] . FedGNN [200] incorporates the high-order user-item interactions by building the local user-item graphs in a privacy-preserving way. Furthermore, noises are injected locally in federated learning to meet the local differential privacy for privacy projection. FeSoG [115] further extends the federated GNNs for social recommendation which involves social information for predictions.

Moreover, decentralized federated learning for GNNs is also explored in [73, 142] , where a client can communicate with a set of neighbor clients for aggregation without the central server. Therefore, existing algorithms [73, 142] propose to aggregate the local model with neighbor local models. For example, decentralized periodic averaging SGD [73] applies SGD on each client locally and synchronizes parameters with their neighbors every certain number of iterations.

To defend against the sensitive attributes/links leakage attack, adversarial learning is adopted for privacy-preserving GNNs [106, 113, 183] . Let H be node representations learned by the encoder/GNN as H = (G; ). The core idea of the adversarial privacy-preserving is to adopt an adversary to infer sensitive attributes from node representations H while the encoder aims to learn representation that can fool , i.e., making unable to infer sensitive attributes. It is theoretically shown in [183] that through this minmax game, the mutual information between learned representation and sensitive attribute, (H, ), can be minimized, which protects the sensitive information from leakage. The process can be formally written as min max L ( (G; )) − L ( (H; )),

where and are parameters of encoder and adversary , respectively. L is the loss function to ensure the utility of the learned representations such as classification loss and reconstruction loss. L is the adversarial loss which generally is cross entropy loss of sensitive attribute prediction of the adversary based on node representations H. is the hyperparameter to balance the contributions of these two loss terms.

Adversarial privacy-preserving GNN is firstly proposed in [106] to defend against attribute inference attack, where link prediction loss and the node attribute prediction loss are combined together for utility. Let adversarial privacy-preserving for information obfuscation of sensitive attributes. Both attribute privacy protection and link privacy protection with adversarial learning are discussed in [183] .

In this subsection, we list the datasets that have been used in the literature about GNN's privacy. The statistics of the datasets along with papers used the datasets are presented in Table 4 .

• Cora, Citeseer, PubMed, DBLP [140, 158] : Cora, Citeseer, PubMed, and DBLP are citation network datasets. Cora consists of seven classes of machine learning papers. CiteSeer has six classes. Papers are represented by nodes, while citations between two papers are represented by edges. Each node has features defined by the words that appears in the paper's abstract. Similarly, PubMed is a collection of abstracts from three types of medical papers. The data of DBLP comes from four research areas. • Facebook, LastFM [103] : Facebook and LastFM are social network datasets. Different user accounts are represented by nodes that are connected by edges. Each user node in Facebook has features such as gender, education, hometown, and location. LastFM was collected through the following relationship in social network. • Flickr [125] : Flickr is an online photo management and sharing application. The edges reflect the common properties shared between two images, whereas the nodes represent an image submitted to Flickr. Users' node features are specified by their tags, which indicate their interests. • Reddit [68] : The Reddit dataset represents the post-to-post interactions of a user. An edge between two posts indicates that the same user commented on both posts. The labels correspond to the community that a post is associated with. • Coauthor [161] : Coauthor is a dataset of co-authorship. Nodes represent authors, and edges connect two authors if they co-authored a paper. Features are collected by keywords in the author's papers. The label of a node is the area that the author focuses on. • ACM [192] : In ACM, nodes are papers. Edges mean if there are same authors in two papers.

Features of a node are keywords of the paper, while labels are the conferences that the papers published. • PROTEIN, DD, ENZYMES [132] : DD, ENZYMES, and PROTEINS are macromolecules datasets.

Nodes are secondary structural elements labeled with their type and a variety of physical and chemical data. If two nodes are neighbors along the amino acid sequence or one of the three nearest neighbors in space, an edge links them. ENZYMES assigns enzymes into six classes, which reflect the catalyzed chemical reaction. The labels in PROTEINS show whether a protein is an enzyme. In DD, nodes are amino acids, and edges are their spatial proximity. • AIDS, NCI1, OVCAR-8H, COX2 [132, 232] : AIDS, NCI1, OVCAR-8H, and COX2 are molecule datasets. Atoms are represented by nodes, while chemical bonds are represented by edges. The node features consist of atom types. The label is decided by toxicity or biological activity in drug discovery projects.

Pretraining and Model Sharing. Nowadays, there is an increasing trend of pretraining models [77, 144] on large-scale datasets to benefit the downstream tasks. In practice, the pretrained model will often be shared to other parties for their use. However, the pretrained model itself has embedded the information of the training data, which can cause private data leakage by privacy attacks such as membership inference and attribute inference. The privacy-preserving GNNs can be applied to address this concern. For instance, differential privacy-preserving GNNs [136, 152, 228] can be adopted in the pretraining phase to defend against the membership inference attack. Distributed Learning. Due to challenges of processing the large amount of data such as privacy concerns, computational cost, and memory capability, the demand of distributed learning is increasing dramatically. In this situation, federated learning on GNNs provide solutions to distributed learning by processing data in local devices. In addition, combining the other privacy-preserving methods such as differential privacy [200] , the client's data can be protected from privacy attacks.

Healthcare. Graph-structured data such as protein molecules, brain network, and patient network are pervasive in healthcare domain. GNNs have been trained on these private healthcare data for various applications. For example, GNNs have been used to process electronic health record (EHR) of patients for diagnosis prediction [109] . GNNs are also deployed to better capture the graph signal of brain activity for medical analysis [5] . To ensure the privacy of sensitive data of patients, privacy-preserving GNNs are required for the applications in healthcare domain.

Recommendation System. GNNs are widely applied in recommendation system to involve social context and better utilization of high-order neighbor information [200] . Similarly, information can be leaked from the GNN-based recommendation system. To protect the user privacy, various privacy-preserving recommendation systems have been proposed [115, 200, 228] .

Defense Against Various Privacy Attacks. Though many privacy-preserving GNNs have been proposed, they mostly focus on defending against membership inference attack and attribute reconstruction attack. The privacy-preservation GNNs against structure attack, property inference attack, and model extraction attack are less studied. Therefore, it is promising to develop privacypreserving GNNs against various privacy attacks. Privacy Attack and Preservation in GNN Pretraining. Model pretraining have been a common scheme to benefit the downstream tasks that are lack of labels. Recently, pretraining of GNNs with supervised tasks [76] and self-supervised tasks [77, 144] have achieved great success. The parameters of pretrained GNNs will be released for downstream tasks, which may lead to private information leakage. However, existing privacy attacks mostly focus on black-box settings and do not investigate the information leakage caused by model releasing. Hence, privacy attack and the corresponding defense methods for pretrained GNN modes need to be explored. Table 5 . Categorization of representative graph adversarial attacks.

Knowledge White-box [201] , [211] , [28] , [37] , [62] Black-box [37] , [122] , [244] , [28] , [245] , [211] , [188] , [172] , [19] , [24] Capability Evasion Attack [201] , [211] , [28] , [37] , [179] , [122] , [121] , [122] , [62] Poisoning Attack [244] , [245] , [172] , [19] , [24] , [188] , [203] , [233] , [105] , [62] Attackers' Goal Targeted Attack [37] , [188] , [179] , [203] , [233] , [244] , [122] , [28] , [105] , [19] Untargeted Attack [172] . [245] , [211] , [62] , [19] , [201] , [121] , [24] Trade off Between Privacy and Utility. Though methods that apply differential privacy, federated learning or adversarial learning have been proposed to protect the privacy of training data, the relations between the privacy protection performance and the prediction accuracy are rarely discussed. For example, in differentially private GNNs [136, 152, 228] , the actual performance in defending against various privacy attacks is generally not evaluated. And in adversarial privacypreserving [106, 113, 183] , how to control the balance between the prediction performance and privacy protection is still not well discussed.

As an extension of neural networks on graph-structured data, GNNs are also vulnerable to adversarial attacks. In addition, due to the message-passing mechanism and graph structure, GNNs can be negatively affected by adversarial perturbations on both graph structures and node attributes. For example, Nettack [244] can fool GNNs to give false predictions on target nodes by poison the training graph with unnoticeable perturbations to graph structure and node attributes. NIPA [172] manages to significantly reduce the global node classification performance of GNNs by injecting a small amount of labeled fake nodes to the training graph. The vulnerability of GNNs has arisen tremendous concerns on adopting GNNs in safety-critical domains such as credit estimation and healthcare. For example, fraudsters can create several transactions with deliberately chosen highcredit users to escape GNN-based fraud detectors, which can cause tremendous loss to individuals, and institutions. Hence, developing robust GNNs is another important aspect of trustworthiness and many efforts have been taken. There are already several comprehensive surveys about adversarial attacks and defenses on graphs [29, 87, 171, 197] . Therefore, in this section, we briefly give the overview of adversarial learning on graphs, but focus more on methods in emerging directions such as scalable attacks, graph backdoor attacks, and recent defense methods.

Graph adversarial attacks aim to degrade the performance of GNNs or to make GNN models give desired output by injecting deliberate perturbations to the graph dataset. Generally, attackers are constrained in the knowledge about the data and the model they attack, as well as the capability of manipulations on the graph. In this subsection, we introduce the threat models in various aspects to show different settings of graph adversarial attacks. Attackers' Knowledge. Similar to privacy attacks, attackers need to possess certain knowledge about the dataset and target model to achieve the adversarial goal. Based on whether the model parameters are known for the attacker, they can be split into white-box and black-box attack:

• White-box Attack: In this setting, the attacker knows all information about the model parameters and the training graph such as adjacency matrix, attribute matrix and labels [28, 62, 211] .

Since it is impractical to obtain all information in the real world, the white-box attack is less practical but often used to show the worst performance of a model under adversarial attacks. • Black-box Attack: In black-box attack [19, 37, 244] , attackers do not have access to the model's parameters but can access graph dataset. More specifically, the full/partial of the graph structure and node features could be accessible for attackers. Attackers may be allowed to have labels used for training or query the outputs of the target GNN, which could be used to mimic the predictions of the target model to achieve black-box attack.

Attackers' Capability. In adversarial attacks, adversarial perturbations are added to data samples to mislead the target GNN model to give the output desired by the attacker. According to the stage the attack occurs, the attacks can be divided into poisoning attack and evasion attack:

• Evasion Attack: The perturbations in evasion attacks [37, 211] are added to the graph in the test stage, where the GNN model parameters have been well trained and cannot be affected by attackers. Depending on whether the attacker knows the model parameters, evasion attack can be further categorized into white-box or black-box evasion attack. • Poisoning Attack: In poisoning attacks [172, 244, 245] , the training graph is poisoned before GNNs are trained. The GNN model trained on the poisoned dataset will exhibit certain designed behaviors such as misclassifying target nodes or having low overall performance. As poisoning attack happens before model training, it belongs to the black-box attack where the model parameters are unknown. Thus, attackers usually train a surrogate model and poison the graph to reduce the performance of the surrogate model. Due to the transferability of adversarial attack, the poisoned graph can also reduce the performance of the target GNN trained on it.

Currently, many graph mining tasks such as semi-supervised node classification and link prediction are transductive learning, where the test samples participate in training phase. Therefore, most of existing works focus on poisoning attacks, which are often more practical for graph mining. Based on the way that the graph data is perturbed, the adversarial attacks can be categorized into manipulation attacks, node injection attack, and backdoor attacks:

• Manipulation Attack: In manipulation attack, an attacker manipulates either graph structure or node features to achieve the attack goal. For example, Nettack [244] perturbs the graph by deliberately adding/deleting edges and revising the node attributes with a greedy search algorithm based on gradients. To make the perturbation more unnoticeable, ReWatt [122] poisons the training graph by rewiring edges with reinforcement learning. • Node Injection Attack: Different from manipulation attack that modifies the original graph, node injection attack aims to achieve the adversarial goal by injecting malicious nodes into the graph [172, 179, 188] . Compared to manipulation attack, node injection attack is more practical. For example, in e-commercial network, attackers need to hack servers or user accounts to manipulate the network; while injecting new malicious accounts is much easier. • Backdoor Attack: Backdoor attacks [203, 233] inject backboor triggers to the training set to poison the model. The backdoor trigger is a predefined or learned pattern, such as a single node or a subgraph. The attacker relabel training nodes/graphs attached with the backdoor trigger to the target label so that a GNN trained on the poisoned dataset will predict any test sample with backdoor trigger to the target label. Compared with other types of adversarial attacks, backdoor attack on GNNs is still in an early stage.

Attackers' Goal. Based on whether the goal of the attacker is to misclassify a set of target instances or reduce the overall performance of GNN model, threat models can also be categorized as:

• Targeted Attack: The attacker aims to fool a GNN model to misclassify a set of target nodes [233, 244] . Meanwhile, the attacker might want the performance of the target model on non-targeted samples remain unchanged to avoid being detected. • Untargeted Attack: The untargeted attack [172, 245] aims to reduce the overall performance of the target GNN model. Since evasion attack cannot affect the parameters, it can only be achieved by poisoning the dataset in the training stage. The categorization of the existing representative graph adversarial attacks in different aspects are summarized in Table 5 . Next we will give more details of these methods.

In this subsection, we first give a unified formulation of adversarial attacks followed by representative methods in evasion attacks and poisoning attacks. Finally, we survey recent advances in backdoor attacks and scalable attacks on GNNs.

Attack. The adversarial attacks can be conducted on both node-level and graph-level tasks. Since the majority of the literature focuses on node classification problem, we give a unified formulation on node-level graph adversarial attacks as an example, which can be easily extended to other tasks.

Definition 4.1. Given a graph G = {V, E} with adjacency matrix A and attribute matrix X, let V be the set of nodes to be attacked, the goal of adversarial attack is to find a perturbed graphĜ that meets the unnoticeable requirement by minimizing the following objective function:

where is the loss for attacking, which is typically set as − ( * (Ĝ) , ) with (·) being the cross entropy. L is the loss for training target model. Generally, in node classification, we apply the classification loss L = ∈V ( (Ĝ) , ). As for G ′ , it can be eitherĜ or G, which correspond to poisoning attack and evasion attack, respectively. As for the search space Φ(G), apart from the way of perturbing the graph, it is also constrained by the budget to ensure unnoticeable attacks. Specially, the constraint from the budget is typically implemented as ∥Â−A∥ + ∥X−X∥ ≤ Δ, where Δ denotes the budget value. For non-targeted attack, V is set all the unlabeled nodes V and will be the prediction of the unlabeled nodes fro a GNN trained on the clean graph G 4.2.2 Evasion Attacks. Evasion attacks focus on the inductive setting, which aims to change the predictions on new nodes/graphs. Based on the applied techniques, evasion attack methods can be split into Gradient-Based Methods and Reinforcement Learning-Based Methods. Gradient-Based Methods. In the early stage of adversarial attacks [28, 37, 201, 211] , they generally focus on white-box evasion attack to demonstrate the vulnerability of GNNs and assess the robustness of model under worst situations. Since the model parameters are available in white-box evasion attacks, it is natural to optimize the objective function in Eq.(10) by gradient decent. More specifically, the objective function of white-box evasion attack can be rewritten as:

where represents mode parameters of the target GNN. However, due to the discreteness of the graph structure, it is challenging to directly solve the optimization problem in Eq. (11) . Therefore, FGA [28] and GradMax [37] use a gradient-based greedy algorithm to iteratively modify the connectivity of the node pair within attack budget. Xu et al. [211] adopt projected gradient decent to ensure the discreteness of perturbed adjacency matrix. Instead of directly using derivatives from the attacks, Wu et al. [201] use integrated gradients to identify the optimal edges and node attributes to be modified for attack.

Recently, there are several attempts in developing gradient-based evasion attacks [121, 179] in a more practical black-box setting, which can be applied to graph classification task and node classification on evolving graphs. By exploiting the connection between the backward propagation of GNNs and random walks, Ma et al. [121] investigates the connections between the change of classification loss under perturbations and the random walk transition matrix. Based on that, they generalize the white-box gradients into a model-independent important scores of PageRank, which avoids using model parameters. Tao et al. [179] investigate the black-box evasion attack with single node injection. Without knowing the model parameters, they train a surrogate model on the train graph and attack the surrogate model to inject a fake node to the graph. A parameterized attribute generator and edge generator is adopted for the node injection attack on unseen test nodes. Reinforcement Learning-Based Methods. In black-box evasion attacks, lacking of model parameters will challenge the gradient-based methods. However, in many scenarios such as drug property prediction, the attacker is allowed to query the target model. In this situation, reinforcement learning can be employed to conduct evasion attacks to learn the optimal actions for graph perturbation [37, 122] . RL-S2V [37] is the first work to apply reinforcement learning for black-box targeted attack. They model the attack process as a Markov Decision Process (MDP) defined as:

• State: The state at time is represented by the tuple (Ĝ , ), whereĜ is the intermediate modified graph at time and is the target node to be attacked. • Action: The attacker of RL-S2V needs to add or delete an edge in each step, which is equivalent to select a node pair. It decomposes the node pair selection action ∈ V × V into a hierarchical structure of sequentially selecting two nodes, i.e., = ( (1) , (2) ), where (1) , (2) ∈ V. • Reward: The goal of the attack is to fool the target classifier on the target node . In RL-S2V, no reward is given in intermediate steps. The non-zero reward is only given at the end as:

• Termination: The process will stop once the agent modifies edges.

RL-S2V adopts Q-learning algorithm to solve the MDP problem for targeted attack. A parameterized Q-learning is implemented for better transferablility. Similar reinforcement learning framework is also applied in ReWatt [122] . To make the perturbations more unnoticeable, ReWatt perturbs the graphs by rewiring, i.e., break an edge and rewire the edge to another node to form . Hence, ReWatt employs a different action space that consists of all the valid rewiring operations.

Injection. Apart from the evasion attacks, extensive poisoning attack methods [37, 122, 172] have been investigated for the transductive learning setting, which are more practical for semi-supervised node classification. The majority of them focus on perturbing the graph data by manipulating the original graph [244, 245] or injecting fake nodes [172, 179] . Similar to the evasion attack, the poisoning attacks by manipulation and node injection can be split into gradient-based methods and reinforcement learning-based: Gradient-based Methods. As Eq.(10) shows, the poisoning attack can be formulated as a bilevel optimization problem. To address this problem, various methods [28, 211, 244, 245] that perform gradient-based attacks on a static/dynamic surrogate model have been investigated. For instance, Nettack [244] deploys a tractable surrogate model, i.e., (Â, X) = softmax(Â 2 XW), to conduct poisoning targeted attack. The surrogate model is firstly trained to capture the major information of graph convolutions. Then, the poisoning attack is reformulated to learning perturbations by attacking the surrogate model as:

where Φ(A, X) denotes the search space under the unnoticeable constraint, which considers both attack budgets and maintaining important graph properties. To solve Eq.(13), Nettack proposes an effective way of evaluating the change of surrogate loss after adding/removing a feature or an edge. The final poisoned graph is obtained by repeatably selecting the most malicious operation in a greedy search manner until reaching the budget. Similarly, FGA [28] adopts a static surrogate model for poisoning attack. Different from Nettack, FGA directly adopts the GCN model and select the perturbations based on gradients.

As the static surrogate model is trained on the raw graph, which cannot accurately reflect the performance of GNN on the poisoned graph, there are also many works [211, 245] that adopt a dynamic surrogate model to consider the effects of added perturbations to the target model. In [211] , min-max topology attack generation is employed for untargeted attack. Specifically, the inner maximization updates the surrogate model on the partial modified graph, and the outer minimization conducts projected gradient decent topology attack on GNN model. Metattack [245] introduces meta-learning to solve the bi-level optimization on the untargeted attack. Essentially, the graph structure matrix is treated as a hyperparameter and the meta-gradient is computed as:

where * , i.e., the parameters of surrogate GNN model, is usually obtained by gradient descent in iterations. For each inner iteration, the gradients of +1 is obtained by

where is the learning rate of the gradient descent in the inner iteration. Though surrogate model provides a way for poisoning attack, if the target model architecture differs a lot from the surrogate model, the poisoned graph might not able to significantly reduce the performance of the target model when trained on it. Thus, instead of using surrogate models, some works [19, 24] design generalized attack loss functions to poison the graph to improve the transferability. For example, Bojchevski et al. [19] exploit the eigenvalue perturbation theory to efficiently approximate the unsupervised DeepWalk loss to manipulate the graph structure. It further demonstrates that the perturbed graph structure is transferable to various GNN models such as GCN. Graph embedding learning is formulated as a general signal process with corresponding graph filter in GF-Attack [24] , which enables a general attacker that theoretically provides transferability of adversarial samples. GF-Attack is able to perturb both adjacency matrix and feature matrix, which leads to better attacking performance than [19] . Reinforcement Learning-based methods. Several reinforcement learning-based methods [172] are also investigated for positioning attack. A novel node injection positioning attack (NIPA) [172] employs the reinforcement learning to generate and attach fake nodes for untargeted attacks. The overall framework of reinforcement learning is similar to RK-S2V that described in Sec.4.2.2. But the action space and design of reward are designed for node injection positioning attack. Specifically, in each step, the action is to connect a fake node to the graph and assign a class label to the fake of node to effectively fool GNN trained on the poisoned graph. The action is decomposed into three steps: (i) select a fake node; (ii) select a real node and connect to the fake node selected; and (iii) assign the label to the selected fake node. The reward is defined based on the performance of the surrogate model trained on the poisoned graph. Let denote the attack success rate on the surrogate model that trained onĜ . The reward is defined as ( , ) = 1 if +1 > , where and denotes the state and action at time ; Otherwise, ( , ) = 0.

In this subsection, we review the recent emerging graph backdoor attacks. In backdoor attacks, the attacker aims to make the host system to misbehave when the pre-defined trigger is present. The backdoor attack on graphs can be applied in various scenarios, which can largely threat the applications of GNNs. For example, an attacker may inject backdoor trigger to the drug evaluation model that possessed by the community. Then, even a useless drug containing the backdoor trigger from the attacker will be classed as an effective medicine. In addition, the backdoor can be applied to federated learning [209] , which threatens the safety of the final global model. Though successful backdoor attacks may lead to enormous loss, there are only few works on backdoor attack methods on graphs [203, 233] . Next, we will introduce each graph backdoor attacks in details. SBA [233] proposes a subgraph-based backdoor attack for graph classification. The attacker will inject a subgraph-based trigger G to a fraction of graphs D ∈ D and change the label of these graphs to an attacker-chosen label . Let (·) denotes the operation of attaching the trigger to the graph. The set of poisoned graphs can be formulated asD = {( (G , G ), ); G ∈ D }. Then, the target model trained on the poisoned datasetD ∪ D , where D = D − D , may treat G as a pattern of the target label . As a result, the target model will misclassify the test graphs containing the backdoor trigger as . In [233] , the subgraph trigger is generated by random graph generation algorithms such as Erdős-Rényi model and small world model. They also investigate the impacts of the trigger size, trigger density, and the size of poisoned graphs, which demonstrates the vulnerability of GNN models to backdoor attacks. GTA [203] focuses on injecting backdoor to a pretrained model for node/graph-level tasks. The backdoor is expected to remain effective even after fine-tuning the pretrained model on downstream tasks. Hence, GTA formulates it as the following bi-level optimization problem

where D andD represent the clean training samples and the samples poisoned by G , respectively. In [203] , the downstream tasks are assumed unknown. They adopt an unsupervised training and design an attack loss that forces the poisoned graphs/nodes have similar embeddings with a predefined sample. Eq.(15) is solved by iteratively conducting inner and outer optimization with a first-order approximation. In addition, a parameterized backdoor generator is adopted to obtain personalized backdoor for each test graph/node.

In real-world scenarios, the graph to be attacked is often in large scale. For example, the Facebook social network contains billions of users. It is challenging to conduct adversarial attacks on such large-scale network. First, the majority of existing works focus on manipulation attacks that try to find the optimal node pairs to be manipulated, which will require very high computation cost. For example, the time and space complexity of Mtattack is ( 2 ) with being the number of nodes in the graph, as it requires to compute the meta-gradient of each node pair. Second, the large-scale graph may exhibit different properties. Therefore, the attack methods on small graphs could be ineffective on large graphs. However, there are only few works investigating the vulnerability of GNNs on large-scale graphs [62] . In this section, we present three promising directions of scalable attacks. Perturbation Sampling: As mentioned, the gradient-based manipulation attack needs to compute the gradients of each pair of node to decide the perturbation on the topology, resulting in unaffordable time and space complexity. In [62] , Projected Randomized Block Coordinate Descent (PR-BCD) is proposed to sample the perturbations and update their corresponding probability scores so as to reduce time complexity. Specifically, the manipulation attack on graph topology is modeled as min P L ( (A ⊕ P), X), where P ≤ Δ and P = 1 denotes an edge flip. The operation ⊕ stands for the operation of element-wise edge flipping. To optimize P, it is relaxed to P ∈ [0, 1] × , where P is the probability of flipping the edge for attacks. In each training iteration, PR-BCD randomly samples a fixed number of perturbations, which lead to a sparse adjacency matrix for both forward and backward computation. The final perturbations can be obtained by the flipping probability score matrix P. This method is applicable for both targeted and untargeted attacks. Experiments on massive graphs with over 100 million nodes empirically show the effectiveness of PR-BCD and the vulnerability of GNNs on large-scale graphs.

Candidates Reduction: For attack on a target node , the search space can actually be reduced as many candidate topology manipulations are ineffective to affect the prediction on . For instance, linking two nodes far from the target node can hardly change the representation of node . And linking a node that share the same label as to can even lead to a more robust graph structure [34] . Therefore, SGA [105] develops a mechanism of perturbation candidate reduction to avoid excessive computation. First, for manipulation on a node pair ( 1 , 2 ), one of the nodes, say 1 , should be in the computation graph of the target node , i.e., -hop subgraph of node in a -layer GNN. Second, for the other node 2 , its class label should be the one that is most easily misclassified to the original class of target node . Finally, SGA assumes that a node 2 is more likely to be selected if node 2 can largely affect the prediction of when the manipulation is directly on node pair ( , 2 ). Therefore, several best candidates of 2 for each manipulation can be selected. With these strategies, the time complexity can be reduced to be linear with the graph size. Node Injection Attacks: When we inject a malicious node for adversarial attacks, we only need to consider · options, where and are the expected degree of the fake node and the graph size. Since is generally very small and even can be set as one [179] , the node injection attack is naturally more scalable than the manipulation attack. Node injection attack is firstly proposed by NIPA [172] . NIPA uses reinforcement learning to inject fake nodes to degrade the overall performance of the target GNN, which has been introduced in Sec. 4.2.3. Several following works [179, 188] further investigate the node injection attacks for better scalability. AFGSM [188] approximately linearizes the target GCN and derives a closed-form solution for a node injection targeted attack, which has much lower time cost. Experiments on Reddit with over 100K nodes demonstrate its effectiveness and efficiency for targeted attack. Targeted attack by injecting a singe node is explored by G-NIA [179] , which avoids the cost of generating multiple nodes for the attack. G-NIA [179] models the optimization problem of node injection attack by a parameterized model, which contains a node generator and an edge generator. The node generator and edge generator are trained by the attack loss on training nodes to gain the generalization ability in attacking test nodes.

As GNNs are vulnerable to adversarial attacks, various robust graph neural networks against adversarial attacks have been proposed, which can be generally categorized into three types: Adversarial Training, Graph Denoising, and Certifiable Robustness. Next, we will introduce the representative methods of each category and some defense methods lie in other category.

Adversarial training is a popular and effective approach to defend against adversarial evasion attacks, which has been widely applied in computer vision [66] to defend against evasion attacks. Generally, adversarial training simultaneously generate adversarial samples that can fool a classifier and force the classifier to give similar predictions for a clean sample and its perturbed version so as to improve the robustness of the classifier. Adversarial training [38, 41, 55, 187, 211] is also investigated to defend against graph adversarial attacks, which can be generally formulated as the following min max game:

where L is the classification loss on the labeled nodes. Δ A and Δ X stand for the perturbations on the topology structure and node attributes, respectively. P (A) and P (X) denote the allowable perturbations within the attack budget. Adversarial training on GNN is firstly explored in [211] , where the perturbations on graph topology are generated by a PGD algorithm. Some variants of [211] are investigated in [27, 194] . Considering that node classification is a semi-supervised learning task by nature, virtual graph adversarial training [41, 55] is applied to further encourage the smoothness of model predictions on both labeled and unlabeled nodes. In [41, 55] , only node feature perturbations are considered in the virtual graph adversarial training as:

where is the weight of virtual adversarial training regularizer. ( |x denotes the prediction of node .

( ( |x ; )|| ( |x + Δ x ; )) enforces the predictions on unlabeled nodes to be similar with and without perturbation.

Though various approaches such as graph adversarial training have been proposed to improve robustness against adversarial samples, new attacks may be developed to invalidate the defense methods, leading to an endless arms race. To address this problem, recent works [184, 246] analyze the certifiable robustness of GNNs to understand how the worst-case attacks will affect the model. Certifiable robustness aims to provide certificates to nodes that are robust to potential perturbations in considered space. For each node ∈ V with label , the certificate ( ; ) can be obtained by solving the following optimization problem

where (Ĝ) denotes the predicted logit of node in class and Φ(G) indicates all allowable perturbed versions of the graph. If ( ) > 0, the GNN is certifiably robust w.r.t. node in considered space Φ(G). In other other, any adversarial samples in Φ(G) cannot change the prediction to node from the target model. The work [246] firstly investigates the certifiable robustness against the perturbations on node features. Some following works [20, 84, 184, 247] further analyze the certifiable robustness under topology attacks. For instance, [247] proposes a branch-and-bound algorithm that obtains a tight bound on the global optimum of the certificates for topology attacks. In [20] , the certificates of a page-rank and a family of GNNs such as APPNP [97] are efficiently computed by exploiting connections to PageRank and Markov decision processes. A technique of randomized smoothing is applied in [184] to give certifiable guarantees to any GNNs. The randomized smoothing will inject noises to the test samples to mitigate the negative effects of the adversarial perturbations. And the obtained certificates are proven to be tight.

Apart from methods of computing certificates of a trained GNN, robust training that aims to increase the certifiable robustness is also investigated in [20, 246] . The main idea is to directly maximize the worst-case margin ( ; ) during training to encourage the model to learn more robust weights. In particular, a robust hinge loss can be added to the training loss to improve certifiable robustness, which can be formally written as

where L denotes the classifcation loss, and > 0 is the hyperparameter for the hinge loss. The worst-case margin,i.e., ( ; ) is encouraged to be larger than with Eq.(19).

Denoising. The adversarial training and certifiable robustness are effective to train robust GNNs to defend against evasion attacks, i.e., attacks happen in the test stage. However, they cannot deal with a poisoned training graphs which have been perturbed by adversarial attacks. In the early investigation about poisoning attacks [201] , topology attack is found to be more effective and in favor by the positioning attacks; while feature-only perturbations generally fail to change the predictions of the target node due to the high dimension of node attribute. Therefore, a promising direction of defensing positioning attacks is to denoise the graph structure to reduce the negative effects of the injected perturbations. Based on the way of denoising the graph, existing methods can be split into Pre-processing, Graph Structure Learning, and Attention-Based methods. Pre-processing. Pre-processing based approaches first denoise the graph using heuristics about network properties or attack behaviors. Then, the GNN model can be trained on the denoised graph to give correct predictions that are not affected by the poisoning attacker. The work [201] firstly proposes a simple and effective pre-processing defense method based on the following two observations on graph adversarial attacks : (i) perturbing graph structures are more effective than modifying the node attributes; and (ii) attackers tend to add adversarial edges by linking dissimilar nodes instead of deleting existing edges. Hence, GCN-Jaccard is proposed in [201] to defend against adversarial attacks by eliminating the edges connecting nodes with low Jaccard similarity of node features. Experimental results show the effectiveness and efficiency of this defense method. Apart from the observations about the properties of linked nodes, the adversarial attack is found to result in high-rank spectrum of adjacency matrix, which corresponds to low singular values [51] . Based on the observation of high-rank attack, low-rank approximation with truncated SVD is used to denoise the graph to resist poisoning attacks. Specifically, they retain a truncated SVD that contains only the top-k singular values of the adjacency matrix. Then, the denoised graph can be reconstructed from the truncated SVD. Their experiments show that only keeping the top 10 singular values of the adjacency matrix is able to defend against Nettack [244] . Graph Structure Learning. Graph structure learning methods [34, 88, 119] aim to simultaneously learn a denoised graph and a GNN model that can give accurate predictions based on the denoised graph. Inspired by the fact that adversarial attacks will lead to high-rank adjacency matrix, Pro-GNN [88] proposes to learn a clean adjacency matrix S with the constraint that (i) S is low-rank and close to the raw perturbed adjacency matrix A such that the adversarial edges are likely to be removed; (ii) S should facilitate node classification; and (iii) S should maintain feature smoothness, i.e., link nodes of similar features. The overall objective function of Pro-GNN can be written as:

where L ( (X, S)) is the classification loss using S. ∥S∥ * stands for the nuclear norm of the learned adjacency matrix S to encourage low-rank of S. The last term (XLX) is to encourage the learned adjacency matrix to link nodes of similar features, whereL is the Laplacian matrix of S. Similar to Pro-GNN, PTDNet [119] also adopt a low-rank constraint to learn to drop noisy edges in an end-to-end manner. But different from Pro-GNN which directly optimizes the adjacency matrix of denoised graph, PTDNet deploys a parameterized denoising network to predict whether to remove the edge with the representations of two nodes from a GNN model.

Though the defense methods using low-rank constraint are proven to be effective, the computation cost of nuclear norm is too expensive for large-scale graphs. Recently, a robust structural noise-resistant GNN (RS-GNN) [34] is proposed to learn an link predictor that efficiently eliminate/down-weight the noisy edges with the weak supervision from the adjacency matrix. In real-world graphs, nodes with similar features and labels tend to be linked; while noisy edges would link nodes of dissimilar features. Therefore, RS-GNN deploys a MLP to predict the weight of the link between and by: ( , ) = (h h ), where h = (x ) and is the activation function such as sigmoid. A novel feature similarity weighted edge-reconstruction loss to train the link predictor to encourage lower weights are assigned to noisy edges. The link prediction is further utilized to predict the missing links in the graph, which can involve more unlabeled nodes in the training to address the challenge of label sparsity.

Reinforcement learning is also applied in graph structure learning for robust representation learning [190] . Specifically, Graph Denoising Policy Network (GDPNet) [190] focuses on denoising the one-hop subgraph of each node. Whether to involve the neighbors of a node is determined sequentially. Therefore, the action space would be whether the selected node ∈ N ( ) at step should be linked with . The state = [h , h ] contains the representations of node by aggregating the previously selected neighborsN ( ) and the selected node . The prediction scores on the downstream tasks are used as the reward signal for the neighbor selection phase. Attention Mechanism. Attention-based defense methods [177, 230] aim to penalize the weights of adversarial edges or nodes in the aggregation of each GNN layer to learn robust representations.

In [177] , PA-GNN utilizes auxiliary clean graphs that share the same data distribution with the target poisoned graph to learn to penalize the adversarial edges. Specifically, adversarial edges are injected to the clean graphs to provide supervision for the penalized aggregation mechanism. Let a ( ) be the attention score assigned to the edge linking and in -th GNN layer. PA-GNN wants the attention weights of clean edges to be larger than the perturbed edges by a margin as

where E and E are the set of clean edges and perturbed adversarial edges from auxiliary graphs. PA-GNN further adopts Meta-learning to transfer the ability of penalizing adversarial edges to GNN on the target graph. GNNGuard [230] computes the attention scores based on the cosine similarity of node representations from last layer. With the similarity-based attention, the adversarial edges are likely to be assigned with small weights since they generally link dissimilar nodes.

In this subsection, we briefly introduce defense methods that do not belong to the aforementioned categories. Robust Aggregation. Some efforts [30, 62, 242] have been taken to design a robust aggregation mechanism the restrict the negative effects of perturbations in the graphs. For instance, RGCN [242] adopts Gaussian distributions as the hidden representations of nodes in each graph convolutational layer. As a result, the proposed RGCN could absorb the effects of adversarial changes in the variances of the Gaussian distributions. In [30] , a median aggregation mechanism is designed to improve the robustness of GNNs. In median aggregation, the median value of each dimension of the neighbor embeddings is used to capture the context information. Only when the portion of clean nodes in the neighbors is less than 0.5, the perturbed values will be selected in median aggregation, which implies its benefits to the model robustness. Following [30] , a soft median aggregation mechanism is applied for scalable defense [62] , which computes the weighted mean where the weight of a an entry is based on the distance to the dimension-wise median. Extensive experiments on large-scale graphs with up to 100M nodes demonstrate the validity and efficiency. Self-Supervised Learning Defense Methods. To address the problem of lacking labels, various self-supervised learning tasks such as link prediction [95] and contrastive learning [218] have been proposed to help representation learning of GNNs. In addition to the prediction accuracy, it is found that some self-supervised tasks can improve the robustness of the GNNs. For instance, SimP-GCN [86] employs a self-supervised similarity preserving task which enforces similar representations for nodes with similar attributes. Therefore, the nodes whose local graph structures are perturbed can still preserve useful representations. In contrastive learning, maximizing the representation consistency between the original graphs and the augmented views of edge perturbation [36, 218] can also result in a more robust model. Some adversarial graph contrastive learning and variants [56, 67, 187, 210] are developed to further improve the robustness by introducing an adversarial view of graphs.

Since robust GNNs can defend against adversarial attacks, applications in safety-critical domains will particularly benefit from robust GNNs. For instance, the investigations in bioinformatics graphs such as protein-protein network [212] and brain network [93] require robust GNNs to defend the attacks in bioinformatics [176] to guarantee the safety. In addition, recent research also shows the vulnerability of GNNs in knowledge graph modeling [223] . Nowadays, GNNs have been widely applied to learn useful representations from the knowledge graph to facilitate various downstream tasks such as recommendation system [186] . Therefore, robust GNNs is required for the application of knowledge graphs. Some works have attempted to apply GNNs in Financial analysis such as credit estimation and fraud detection [189] . Therefore, robust GNNs are urgent for the security of GNNs in real-world financial analysis.

Scalable Robust GNNs. As discussed in Sec.4.2.5, some initial efforts have verified that adversarial attacks can be applied on extremely large graphs to achieve the attackers' goal. However, scalable defense methods are rather limited [62] . Though some methods such as GCN-Jaccard is efficient, the defense performance is generally not good enough. For more advanced methods such as Pro-GNN, the computation cost is unaffordable for large-scale graphs. Thus, it is an emerging and promising direction to develop scalable robust GNNs. Robust GNNs on Heterogeneous Graphs. Many real-world graphs such as product-user network are heterogeneous, which contain diverse types of objects and relations. Extensive Heterogeneous Graph Neural Networks (HGNNs) have been investigated for heterogeneous graphs. However, recent analysis [225] also shows that the adversarial attacks bring more negative effects to metapathbased HGNNs than general GNNs. Despite extensive works on robust GNNs, they are dedicated to homogeneous graphs, which can rarely handle heterogeneous graphs. Thus, developing robust HGNNs still remains an open problem. Robust GNNs Against Label Noises. Existing works mainly focus on defending the adversarial attacks on graph structure and node features; while the noises and attacks on labels such as label-flipping attack [224] can also significantly degrade the performance for GNNs. Several initial efforts [36, 110, 224] are conducted to address the challenge of label noises. For instance, the authors in [36] firstly develop a label noise-resistant GNN (NRGNN) by linking the unlabeled nodes with (pseudo) labeled nodes with similar features, which can improve the performance of GNNs against label noise or label flipping attack. Though promising, the robust GNNs against label noises is still in an early stage that needs further investigation.

Fairness is one of the most important aspects of trustworthy graph neural network. With the rapid development of graph neural network, GNNs have been adopted to various applications. However, recent evidence shows that similar to machine learning models on i.i.d data, GNNs also can give unfair predictions due to the societal bias in the data. The bias in the training data even can be magnified by the graph topology and message-passing mechanism of GNNs [35] . For example, recommendation system based on random walk is found to prevent females from rising to the most commented and liked profiles [146, 170] . A similar issue has been found in book recommendation with graph neutral network, where the GNN methods could be biased towards suggesting books with male authors [22] . These examples imply that GNNs could be discriminated towards the minority group and hurt the diversity in culture. Moreover, the discrimination would largely limit the wide adoption of GNNs in other domains such as ranking of job applicants [128] and loan fraud detection [207] , and even can cause legal issues. Therefore, it is crucial to ensure that GNNs do not exhibit discrimination towards users. Hence, many works have emerged recently to develop fair GNNs to achieve various types of fairness on different tasks. In this section, we will give a comprehensive review of the cutting-edge works on fair GNNs. Specifically, we first introduce the major biases that challenge fairness of GNNs. We then describe various concepts of fairness that are widely adopted in literature, followed by categorization and introduction of methods for achieving fairness on graph-structured data. Finally, we present public datasets and applications and discuss future directions that need further investigation.

Biases widely exist in real-world datasets, which can lead to unfair predictions of machine learning models. Olteanu et al. listed various biases that exist in social data [138] . Suresh et al. further discussed various types of biases that cause discrimination issues of machine learning models [174] . According to [128] , the bias in machine learning can appear in different stages such as data, algorithm and user interaction. In this paper, we mainly focus on biases in graph-structured data and on GNNs. For a comprehensive review of biases that occur in other phases such as training and evaluation of machine learning models on i.i.d data, please refer to the survey [128] . First, similar to i.i.d data, node attributes/features are often available in graph-structured data. In addition, the data collection of graph-structured data follows similar procedures to i.i.d data such as data sampling and label annotation. Thus, the following biases that widely exist on i.i.d data also exist in graphs.

• Historical Bias. Various biases such as gender bias and race bias exist in the real world due to historical reasons. These biases can be embedded and reflected in the data. For a system that reflect the world accurately, it can still inflict harm on a population who experience the historical bias [174] . An example of this type of bias is the node embedding learning for link prediction [146] . In particular, users with the same sensitive attributes such as gender and race are more likely to be linked in a real-world graph. As a result, the learned node embeddings will tend to link users with the same gender/race. Then, applications such as friend recommendation that built on these types of node embeddings will reinforce the historical bias. • Representation Bias. Representation bias occurs when the collected samples under-represent some part of the population, and subsequently fails to generalize well for a subset of the use population [174] . The representation bias can be caused in several ways: (i) the target population does not reflect the use population; (ii) the target population under-represent certain groups; and (iii) the sampled data does not reflect the target population. • Temporal Bias. Temporal bias arises from differences in populations and behaviors over time [138] .

For graphs, temporal bias can be caused by both the change of node attributes and graph topology. One example is the social network, where the attributes and links of users will evolve over time.

• Attribute Bias. Given an attributed network and the corresponding group indicator (w.r.t. the sensitive attribute) for each node. For any attribute, if the distributions between different demographic groups are different, then attribute bias exists in graph [44] . Attribute bias focuses on the biases in the node attributes of the graphs. In addition to the aforementioned biases, there are unique types of biases in graph-structured data due to the graph topology: • Structural Bias. Given an undirected attributed network and the corresponding group indicator (w.r.t. sensitive attribute) for each node. If any information propagation promotes the distribution difference between different groups at any attribute dimension, then structural bias exists in the graph [44] . • Linking Bias. Linking bias arises when network attributes obtained from user connections, activities, or interactions differ and misrepresent the true behavior of the users [138] . For instance, it is found that younger people are more closely connected than older generations on social network [45] . Moreover, low-degree nodes are more likely to be falsely predicted [178] , which leads to degreerelated bias. Generally, GNNs adopt a message-passing mechanism, which aggregates the information of neighbors to enrich the representation of the target nodes. As a result, the learned representations can capture both node attributes and its local topology, which can facilitate various tasks such as node classification. However, due to the biases in topology, the message passing mechanism of GNNs can magnify the biases compared with MLP [35] . In graphs such as social networks, nodes of similar sensitive attributes are more likely to connect to each other [45, 146] . For example, young people tend to build friendship with people of similar age on the social network [45] . The message passing in GNNs will aggregate the neighbor features. Thus, GNNs learn similar representations for nodes of similar sensitive information while different representations for nodes of different sensitive features, leading to severe bias in decision making, i.e., the predictions are highly correlated with the sensitive attributes of the nodes.

In this subsection, we introduce the most widely used fairness definitions, which can be generally split into two categories, i.e., group fairness and individual fairness.

The principle of group fairness is to ensure groups of people with different protected sensitive attributes receive comparable treatments statistically. Various criteria of group fairness have been proposed. Next, we will introduce the mostly used group fairness definitions. Definition 5.1 (Statistical Parity [47] ). Statistical parity, also known as demographic parity, requires the predictionˆto be independent with the sensitive attribute , i.e.,ˆ⊥ . The majority of the literature focus on binary classification and binary attribute, i.e., ∈ {0, 1} and ∈ {0, 1}. In this situation, statistical parity can be formally written as:

According to statistical parity, the membership in the protected sensitive attributes should have no correlation with the decision from the classifier. Given the definition in Eq. (22) , the fairness in terms of statistical parity can be measured by:

A lower Δ indicates a more fair classifier. The statistical parity can be easily extended to multiclass and multi-category sensitive attributes problem by ensuringˆ⊥ . According to [116] , let ∈ { 1 , . . . , } and s ∈ { 1 , . . . , } denotes the multi-class label and multi-category sensitive attribute, where is number of classes and is number of sensitive attribute categories, the evaluation metric can be extended to:

Statistical parity is the first fairness definition and have been widely adopted. However, some following works [69] argue that statistical parity often cripples the utility of the model. Hence, Equalized Odd is proposed to alleviate the issue, which is defined as: Definition 5.2 (Equalized Odds [69] ). A predictor satisfies equalized odds with respect to protected attribute and class label , if the predictionˆand are independent conditioned on , i.e.,ˆ⊥ ∥ . When ∈ {0, 1} and ∈ {0, 1}, this can be formulated as:

Equalized odds enforces that the accuracy is equally high in all demographics, punishing models that perform well only on the majority. In the binary classification, we often set = 1 as the "advantaged" outcome, such as "not defaulting on a loan" or "admission to a college". Hence, we can relax the equalized odds to achieve fairness within the "advantaged" outcome group, which is known as Equal Opportunity. [69] ). It requires that the probability of an instance in a positive class being assigned to a positive outcome should be equal for both subgroup members, i.e.,

Equal opportunity expects the classifier to give equal true positive rates across the subgroups, which allows a perfect classifier. Similar to statistical parity, equal opportunity can be measured by

Equalized odds and equal opportunity can be naturally extended to multi-class and multi-category sensitive attributes setting by changing the range of sensitive attributes and labels.

Definition 5.4 (Dyadic Fairness [107] ). This can be viewed as an extension of statistical parity for link prediction. It requires the link predictor to give predictions independent with the sensitive attributes of the target nodes. A link prediction algorithm satisfies dyadic fairness if the predictive score satisfies:

where (·) is the link predictor, ( ) and ( ) denote the sensitive attributes of node and , respectively.

Since the dyadic fairness is extended from the link prediction, the evaluation metric can be simply extended from Δ in Eq.(24) by replacing the classification probability to link prediction probability, i.e., Δ = | ( ( , )| ( ) = ( )) − ( ( , )| ( ) ≠ ( ))|

While group fairness can maintain fair outcomes for a group of people, a model can still behave discriminatorily at the individual level. Individual fairness is based on the understanding that similar individuals should be treated similarly.

Definition 5.5 (Fairness Through Awareness [47] ). Any two individuals who are similar should receive similar algorithmic outcome. Let , ∈ X be two data points in dataset X, and (·) denotes a mapping function. The fairness through awareness can be formulated as: Table 6 . Categorization of fair models on graphs according to the revised stage.

Pre-processing [91] , [44] , [168] In-processing [35] , [21] , [124] , [44] , [191] , [4] , [107] , [91] , [43] , [22] , [99] , [94] , [146] Post-processing [91] where (·) and (·) are two distance metrics required to be defined in the application context. 

for all and any value ′ of protected attribute .

The counterfactual fairness can be viewed as individual fairness whose similarity metric treats the individual and its counterfactual sample as a similar pair. How to evaluate individual fairness remains an under-explored research direction. In [91] , a measure of individual fairness is proposed. The metrics of group fairness such as Δ are also utilized for evaluating the counterfactually fair GNNs [4] .

Extensive attempts have been proposed to eliminate the discrimination in machine learning models on i.i.d data [128] . However, these methods cannot be directly applied to graph-structured data because of the unique biases brought by the graph topology and message passing mechanism. Recently, with the remarkable success of GNNs, the concern on fairness issue of GNNs is attracting increasing attention. In this section, we introduce the debiasing methods for achieving fairness in GNNs. Following the categorization of fair machine learning algorithms on i.i.d data [114, 128] , existing fairness-aware algorithms can be split into pre-processing methods, in-processing methods, and post-processing methods, based on which stage the debiasing is conducted. Pre-processing approaches are applied to eliminate the bias in data with fair pre-processing methods. In-processing approaches are designed to revise the training of machine learning models to ensure the predictions meet the target fairness definition. Post-processing methods directly change the predictive labels to ensure fairness. Table 6 lists existing works on Fair GNNs into these three categories. Based on the techniques they apply, we categorize the debiasing methods on graph-structured data into Adversarial Debiasing, Fairness Constraints, and others. Next, we introduce the details of the methods following the categorization based on the techniques.

Debiasing. Using adversarial learning [65] to eliminate the bias is firstly investigated in the fair machine learning models on i.i.d data [18, 50, 112, 123] . Several efforts [21, 35, 44, 124] have been taken to extend the adversarial debiasing on graph-structured data. An illustration of adversarial debiasing is presented in Figure 4a . Generally, adversarial debiasing adopts an adversary to predict sensitive attributes from the representations H of an encoder ; while the encoder aims to learn representations that can fool the adversary while can give accurate predictions for the task at hand, say node classification. With the minmax game, the final learned representations will contain no sensitive information, resulting in fair predictions that independent with the sensitive attributes. Thus, statistical parity or dyadic fairness can be guaranteed with the 

where and are parameters of encoder and adversary , respectively. L is the loss function to ensure the utility of the learned representations such as node classification loss and link reconstruction loss. L is the adversarial loss, which generally is cross entropy loss of sensitive attribute prediction of the adversary based on the learned node representations H. is the hyperparameter to balance the contributions of these two loss terms.

The seminar work [21] firstly applies adversarial debiasing on graph-structured data to learn fair node embeddings. Specifically, it adopts adjacency matrix reconstruction with negative sampling as the utility loss to learn node embeddings. Let ( , ) be the predicted probability that and are linked. The utility loss can be written as:

where N ( ) represents the neighbors of node and ( ) is the distribution of sampling negative nodes for . is number of negative samples. An MLP is deployed as the adversary to predict the sensitive attributes from the node embeddings H. The adversarial loss is given as the binary cross entropy loss between the predictions from the adversary and the real sensitive attributes, i.e.,

Aasrour et al. [124] investigate adversarial debiasing with similar implementation of losses to learn representations for fair link prediction to avoid the separation of users. In addition, they propose a metric to determine whether the predicted links will lead to further separation of the network to evaluate the model.

Though the aforementioned methods achieve fairness with adversarial debiasing, they focus on learning node embeddings and do not use GNN model as the encoder. Recently, FairGNN [35] proposes a framework for fair node classification with graph neural networks. It uses the node classification loss as the utility loss, i.e., L = ∈V −[ log(ˆ) + (1 − ) log(1 −ˆ)) where V is the set of labeled nodes. For many real-world applications such as user attribute prediction in social medium, obtaining sensitive attributes of nodes for is difficult. To address the challenge of lacking sensitive attributes for adversarial debiasing, FairGNN adopts a GCN-based sensitive attribute estimator to estimate sensitive attributes˜for nodes with missing sensitive attributes. It then uses the output of adversary and the estimated sensitive attribute˜for adversarial loss.

In addition, theoretical analysis in [35] demonstrates that the statistical parity can be guaranteed with adversarial debiasing given the estimated sensitive attributes.

All the aforementioned methods focus on debiasing the node representations. Alternatively, adversarial debiasing can also be applied to debias the original graph data. For example, EDITS [44] utilize a WGAN-based [9] framework to eliminate the attribute bias and structural bias. The attribute matrix X is revised to ensure same attribute distribution between different demographic groups. Similarly, an adjacency matrix A that will not cause bias after information propagation is learned. More details of the representative adversarial debiasing methods are listed in Table 7 .

Constraints. In addition to adversarial debiasing, directly adding fairness constraints to the objective function of machine learning modes is another popular direction. These constraints are usually derived from the fairness definitions introduced in Section 5.2. As the general framework of fairness constraints in Figure 4b , they work as the regularization term and balance the performance in prediction and fairness. The overall objective function can be written as

where is the set of model parameters to be learned, L is the loss function for the utility of the model, L denotes the applied fairness constraint, and controls the trade-off between utility and fairness. To enforce different notion of fairness, various constraints have been investigated for fair GNNs [4, 35, 43, 91, 107, 191] . The details of these methods are given in Table 8 . Statistical Parity & Dyadic Fairness. In FairGNN [35] , apart from the adversarial debiasing, it also adopts the covariance constraint for statistical parity to further enforce fairness. Specifically, the covariance constraint minimizes the absolute covariance between the estimated sensitive attributẽ and predictionsˆ, i.e., |

Enforcing the predictions to have no correlation with the estimated sensitive attributes will be helpful to learn classifier that give predictions independent with the protected attributes. UGE [191] assumes that a bias-free graph can be generated from the pre-defined non-sensitive attributes. Then, a regularization term pushes the embeddings to satisfy properties of the bias-free graph to eliminate bias. In particular, they enforce (x , x ), i.e. the probability of predicting links between nodes and with complete attributes, is the same as (x ,x ), i.e., the probability of prediction links between and with bias-free attributes. In FairAdj [107] , a regularization term, |E[ ( , )| = ] − E[ ( , )| ≠ ] |, that directly derived from dynamic fairness based on Eq.(28) is used to debias the adjacency matrix. Individual Fairness. Moreover, two constraints for individual fairness are explored in InFoRM [91] and REDRESS [43] . InFoRM [91] proposed a regularization term S (y − y ) 2 , where y ∈ R and y ∈ R denote the prediction vectors of nodes and , and S ∈ [0, 1] is the similarity Table 9 . Fair models on graphs belonging to other categories

FairWalk [146] Link Prediction Statistical parity CrossWalk [94] Node & Link Statistical parity FairCL [99] Node classification Counterfactual fairness FairDrop [168] Node Classification Statistical parity Debayes [22] Link Prediction Statistical parity score between and . In this way, for two similar nodes, their predictions will be encouraged to be similar. As for REDRESS [43] , it aims to optimize the consistency between the prediction similarity matrix Sˆand the oracle similarity matrix S G from a ranking perspective. For each node, the relative order of each node pair by Sˆand that by S G are enforced to be the same. Counterfactual Fairness. To achieve counterfactual fairness, NIFTY [4] proposes to maximize the agreement between the original graph and its counterfactual augmented views. More specifically, the counterfactual sample˜of an initial sample is generated by (i) modifying the value of sensitive attribute; and (ii) randomly masking the other attributes and perturb the graph structure. Then, the constraint described in Table 8 can be applied to reduce the the gap between the predictions on the original graph and its counterfactual samples, which will enable the counterfactual fairness.

Categories. Apart from aforementioned fair GNNs, there are several methods that do not belong to the adversarial debiasing or fairness constraints, which are presented in Table 9 . More specifically, users in the same sensitive attribute group are more likely be sampled into a trace with the generally random walk, resulting to the correlation between the sensitive attributes and predictions. Therefore, unbiased sampling strategies in the random walk are investigated in [94, 146] to learn unbiased embeddings for downstream tasks such as node classification and link prediction. In [99] , counterfactual augmented views generated by attribute masking and edge deletion are applied in contrastive learning for fair node representations for classification. FairDrop [168] proposes to drop more connections between nodes sharing the same sensitive attribute to reduce the bias of homophily in sensitive attributes. Debayes [22] investigates a Bayesian method that is capable of learning debiased embeddings by using a biased prior.

Generally, graph datasets utilized to evaluate the performance of GNNs in terms of fairness need to (i) exhibit bias issue; and (ii) have both node label and node sensitive attributes available if the task is node classification. Below, we list some of the widely used datasets that are suitable for evaluating the performance of fair GNNs for node classification and/or link prediction problems. The statistics of the datasets along with papers using these datasets are presented in Table 10 .

• Pokec-n & and Pokec-z [35] : Pokec-n and Pokec-z datasets collect users' data from Pokec social networks of two provinces in Slovakia in 2012, which is similar to Facebook. Each node in the graph contains attributes such as gender, age, hobbies, interest, education, working field and etc.

The datasets target at predicting the occupations of users and the sensitive attribute is region. • NBA [35] : The NBA dataset utilizes 400 NBA players and their social relations on Twitter to construct the graph. The performance statistics of players in the 2016-2017 season and other information e.g., nationality, age, and salary are provided. The task is to predict whether the players' salary is over median. The sensitive attribute for this dataset is nationality, which is binarized to two categories, i.e., U.S. players and oversea players. • German Credit [4] : The German Credit dataset collects data from a German bank [11] . Nodes in the graph represent clients and edges are built between clients if their credit accounts are similar. With clients gender as the sensitive attribute, the node classification task aims to predict whether the credit risk of the clients is high. • Recidivism [4] : In the Recidivism dataset, nodes are defendants released on bail during 1990-2009 [89] . Two nodes are connected if two defendants' past criminal records and demographics are similar. The task is to predict a defendant as bail (i.e., unlikely to commit a violent crime if released) or no bail (i.e., likely to commit a violent crime) with race being the sensitive attribute. • Credit Defaulter [4] : In this dataset, nodes represent credit card users and they are connected based on the similarity of their purchase and payment records. The sensitive attribute of this dataset is age and the task is to classify whether a user will default on credit card payment. • MovieLens [21] : The MovieLens dataset is a recommender system benchmark [71] , whose target task is to predict the rating that users assign movies. Sensitive attributes about the user features, such as age, gender, and occupation, are covered in the dataset. • Reddit [21] : The Reddit dataset is based on the social media website Reddit where users can comment on content in different topical communities, called "subreddits". For sensitive attributes, this dataset treats certain subreddit nodes as sensitive nodes, and the sensitive attributes for users are whether they have an edge connecting to these sensitive nodes. • Polblog [139] : Polblog is a blog website network [2] . Nodes represent blogs and links denote hyperlink between blogs. The sensitive attributes for this dataset are blog affiliation communities. • Twitter [94] : This is a subgraph of the Twitter dataset [13, 23] . The sensitive attribute is the political leaning of each user, including neutrals, liberals and conservatives. • Facebook [91] : The dataset is collected from the Facebook website . Nodes are users and edges represent friendship between users [126] . The sensitive attribute for this dataset is gender. • Google+ [124] : It is collected from Google+ [126] . Nodes in the dataset are users and they are connected concerning their social relationships. The sensitive attribute for this dataset is gender. • Dutch [124] : This is from the school network [166] with gender as the sensitive attribute. It corresponds to friendship relations among 26 freshmen at a secondary school in the Netherlands.

Social Network Analysis. With the emerging of social media platforms such as Facebook, Twitter, and Instagram, the social network analysis is widely conducted to provided better service to users. For example, the platforms may use GNNs to recommend new friends to a user [54] . Node classification are also widely conducted on social networks to further complete user profile for better service [35] . However, recent works [35, 146, 170] indicate that GNNs can be biased to the minority in friends recommendation and node classification on social networks. For instance, the algorithm have been found to prevent minorities from becoming influencers. The message-passing on graphs can magnify the bias [35] . Therefore, several fair GNNs [35, 94, 99, 124, 191] for social network analysis have been proposed.

Recommender System. The user interactions on products such as books can link the users and products to compose bipartite graph. In addition, the social context of users may also be utilized for recommendation. Because the great power of GNN in process graphs, many platforms have applied GNNs for the recommendation system [68, 217] . But fairness issue is also reported in recommendation system. For instance, it is found that a GNN-based algorithm on book recommendation may be biased towards suggesting books with male authors [22] . Hence, it is necessary to develop fair graph neural networks for recommendation system. Financial Analysis. Recently, there is a growing interest in applying GNNs to financial applications such as loan default risk prediction [33, 111] and fraud detection [147] . In loan default risk, the guarantee network [33] or user relational graph [111] can be applied to learn more powerful representations for predictions. In fraud detection, GNNs on transaction [147] are also investigated. Similar to the applications in other domains, GNNs also exhibit bias towards protected attributes such as genders and ages in financial analysis [4] . Using fair GNNs [4, 44] in finance can ensure the fairness to users and avoid the social and legal issues caused by the bias in the GNN model.

Though many fair models on graph-structured data have been investigated, there are still many important and challenging directions to be explored. Next, we list some promising research directions. Attack and Defense in Fairness. Recent works have shown that an poisoning attacker can fool the fair machine learning model to exacerbate the algorithmic bias [129, 167, 180] . For instance, one can generate poisoned data samples by maximizing the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model. Thus, a seriously biased model caused by the attacker might be treated as a fair model by the end-user due to the deployment of fair algorithms, which can result in social, ethical and legal issues. Since GNNs are an extension of deep learning on graphs, fair GNNs are also at a risk of being attacked. Without understanding the vulnerability and robustness of fair GNNs, we cannot fully trust a fair GNN. Despite the initial efforts on attacking fair models [129, 167, 180] , all of them focus on i.i.d data; while the studies on vulnerability of fair GNNs are rather limited. Note that to achieve a trustworthy GNN, the robustness and fairness should be simultaneously meet. However, as it is discussed in Section 4, current robust GNNs generally focus on the robustness in terms of performance and rarely investigate the robust models against attacks in both accuracy and fairness. Therefore, it is crucial to investigate the vulnerability of fair GNNs and develop robust fair GNNs. Fairness on Heterogeneous Graphs. Many real-world graphs such as social networks, knowledge graph, and biological networks are heterogeneous graphs, i.e., networks containing diverse types of nodes and/or relationships. Various GNNs have been proposed to address the challenge of representation learning on heterogeneous graphs, such as learning with meta-path [78, 192] and designing new message-passing mechanisms for heterogeneous graphs [154] . Recently, it is reported that the representations learned by heterogeneous GNNs can contain discrimination [222] , which could result in societal prejudice in the applications. For example, the social biases have been identified in knowledge graphs [164] . And the representations of knowledge graph learned by heterogeneous GNNs are widely adopted to facilitate the searching and recommender system. Hence, the encoded biases in the representations could lead to detrimental societal consequences. However, existing fair algorithms are generally designed for GNNs on homogeneous graphs, which are not able to mitigate the bias brought by the meth-path neighbors or the message-passing mechanism specifically designed for heterogeneous information networks. Therefore, it is necessary to develop fair GNns to address the unique challenges brought by the heterogeneity graphs.

Fairness without Sensitive Attributes. Despite the ability of the aforementioned methods in alleviating the bias issues, they generally require abundant sensitive attributes to achieve fairness; while for many real-world applications, it is difficult to collect sensitive attributes of subjects due to various reasons such as privacy issues, and legal and regulatory restrictions. As a result of, most of existing fair GNNs are challenged due to the lacking of sensitive attributes in training data. Though investigating fair models without sensitive attributes is important and challenging, only some initial efforts on i.i.d data have been conducted [72, 102, 236] . How to learn fair GNN without sensitive attributes is a promising research direction.

Parallel to the effectiveness and prevalence of deep graph learning systems, the property of lacking interpretability is shared by most deep neural networks (DNNs). DNNs typically stack multiple complex nonlinear layers [226] , resulting in predictions difficult to understand. To expose the black box of these highly complex deep models in a systematic and interpretable manner, explainable DNNs [134] , have been explored recently. However, most of these works focus on images or texts, which cannot be directly applied to GNNs due to the discreteness of graph topology and the message-passing of GNNs. And it is very important to understand GNNs' predictions for two reasons. First, it enhances practitioners' trust in GNN models by enriching their understanding of the network characteristics. Second, it increases the models' transparency to enable trusted applications in decision-critical fields sensitive to fairness, privacy and safety challenges. Highquality explanations can expose the knowledge captured, helping users to evaluate the existence of possible biases, and make the model more trustworthy. For example, counterfactual explanations are utilized in [159] to analyze the fairness and robustness of black-box models, in order to build a responsible artificial intelligence system. A model-agnostic explanation interface is also designed in [17] to continuously monitor model performance and validate its fairness. Therefore, explainable GNNs are attracting increased attention recently and many efforts have been taken.

In this section, we will provide a comprehensive survey on the current progress in the explainability of GNNs. First, we introduce the background of explaining GNN models and provide a motivating example. Following that, a comprehensive review of existing explanation methods would be presented. Popular datasets and evaluation metrics in this domain are also introduced. Finally, we go through some future research directions. Compared to the existing review in [220] , the main improvement of this survey is that we covered more recent progress such as self-explainable GNNs [36, 235] and discuss more reliable evaluation settings [52] .

6.1 Backgrounds 6.1.1 Aspects of Explanation. As discussed in [39, 131] , the term explainability itself needs to be explained. Generally, explainability in GNNs should (i) guide end-users or model designers to understand how the model arrives at its results; (ii) enable users to have an expectation on the decisions; (iii) and provide information on how and when the trained model might break. To cope with the needs of obtained explanations, explainability must be considered within the context of particular disciplines. As a result, developed explanation methods often show a high level of variety and provide explainability at different levels. Generally, explanations can be categorized from the following perspectives:

• Global or Local explanations. A local explanation (or instance-level explanation) provides justification for prediction on each specific instance. Currently, most GNN explanation works [118, 216] fall within this group. On the other hand, global explanations (or model-level explanations) [219] reveal how the model's inference process works, independently of any particular input. • Self-explainable or Post-hoc explanations. Self-explainable GNNs design specific GNN models that are interpretable intrinsically, which can simultaneously give the prediction and corresponding explanation. The explainability arises as part of the prediction process for self-explaining methods. On the contrary, post-hoc explanations focus on providing explanations on a trained model. An additional explainer model is generally adopted for the post-hoc explanations [216] . However, due to the adoption of the explainer model, the post-hoc explanations may misinterpret the actual inner working of the target model. • Explanation forms to be presented to the end users. Explanations should help users to understand GNNs' behaviors. Various explanation forms have been investigated such as bag-ofedges [16] , attributes importance [216] , sub-graphs [221] , etc. Different explanation forms can give different visualizations and offer different information to end users. • Techniques for deriving the explanations. To enable the explainability, various explanation techniques have been developed including perturbing the input [216] , analyzing internal inference process [16] , designing intrinsically interpretable GNN models [36] , etc. These methods make different assumptions about the model and their advantages may vary across datasets.

A comprehensive taxonomy and detailed introduction of methods are presented in Section 6.3.

With the explanation model and its produced explanations obtained, different aspects regarding explanation quality can be evaluated. Whereas a gold standard exists for comparing predictive models, there is no agreed-upon evaluation strategy for explainable AI methods. As argued in [134] , evaluating the plausibility and convincingness of an explanation to humans is different from evaluating its correctness, and those criteria should not be conflated. In this part, we systematically analyze certain properties that good explanations should satisfy. Considerations of these qualities motivate the design of different explanation methods and evaluation metrics, which will be discussed in Section 6.3 and Section 6.5 respectively.

• Correctness: The obtained explanations should be correct, and truthfully reflect the reasoning of the target predictive model (either locally or globally). This quality addresses the faithfulness of explanations and requires that descriptive accuracy of the explanation is high [133] . • Completeness: Completeness addresses the extent to which identified explanations explain the target model. Ideal explanations should contain "the whole truth" [100] . High completeness is desired to provide enough details, and it should be balanced with correctness [134] . • Consistency: The obtained explanations should be consistent concerning to inputs. In other words, explaining should be deterministic and the same explanation should be provided for identical inputs [150] . It has also been argued that explanations should be invariant toward small perturbations [7] . • Contrastivity: Contrastivity facilitates distinctions of obtained explanations of different predictions. Explanation of a certain event should be discriminative in comparison to those of other events [130] . For models taking different decision strategies, explanations of their behaviors should also be distinct [3] . • User-friendly: An ideal explanation is expected to be user-friendly. Explanations should be presented in a form that is clear, easy to interpret, and "agree with human rationales" [12] . For example, it is argued that explanations should be compact, sparse and avoid redundancy [134] . Some formats are also found to be more easily interpretable than others [80] .

Example. Due to diverse settings and the complexity of existing algorithms, discussing and comparing GNN explanation methods can often become abstract. To make this explanation task more concrete, we give an example of instance-level explanation, which has been widely taken in various works [118, 216, 221] . As shown in Figure 5 , the explanation objective is to find discriminative substructures, including edges and node attributes, that are important for the prediction of to-be-explained GNN. Note that there are works investigating other explanation forms, like class-wise prototypical structures [235] and interpretable surrogate models [79] .

Node Attributes Target Node [63] along edges. It is difficult to estimate contribution of edges as they are discrete, and could appear in the computational graphs multiple times at different layers. In turn, identifying important structures is even more complicated as interactions among nodes and edges are involved. Second, graph is less intuitive than images or texts. It is difficult for users to analogize the commonalities and dissimilarities among graphs, making the evaluation and interpretation of obtained explanations (still in the form of graphs) challenging. Domain knowledge often is required to understand the obtained graph "explanations".

To make deep models applicable in real scenarios, researchers have made extensive attempts to get an explanation from deep models, especially in the image and text domains. However, due to the complexity of graph data and the less human-understandable message-passing mechanism in GNNs, it is difficult to directly extend explanation methods for image or text data to graph data. Recently, to address these challenges, researchers begin to focus on the explainability of GNN models and propose many specific models. In this section, we provide a high-level summary of existing GNN explanation methods and categorize them into three groups: (1) instance-level post-hoc explanations, (2) model-level post-hoc explanations, and (3) self-explainable methods that are intrinsically interpretable. Most existing GNN explanation methods are designed for the instance-level post-hoc setting, and we further arrange them based on their adopted techniques for achieving explainability. A summary of method taxonomy is shown in Table 11 .

6.3.1 Instance-level Post-hoc Explanation. Instance-level post-hoc explanation identifies elements (like node attributes and edges) that are crucial for model's prediction for each specific input instance. Typically, given an input graph G, which could be a graph sample for graph-level tasks or the local graph of a node for node-level tasks, it aims to find a sub-graph G ⊂ G that accounts for prediction output of the target GNN model. Based on different strategies in identifying input substructures as explanations, we can further summarize existing methods into three groups: (1) Attribution Methods, which directly analyze the influence of input elements on the prediction using gradients or through perturbations; (2) Decomposition Methods, which examine the inference of [79, 182, 231] Model-level Post-hoc [219, 235] Self-explainable [36, 235] deep models by decomposing prediction result into importance mass on the input; (3) Surrogate Methods, which train an interpretable model that mimics the behavior of to-be-explained deep model within the neighborhood of current input. Next, we introduce the details of these three categories.

Attribution Methods. Attribution, also referred to as relevance [206] , aims to reveal components of high importance in the input. These methods provide explanations through measuring the contribution of input elements in the target decision, and finding the subgraph with top contribution weights as explanation G . Based on their strategies in estimating each element's contribution, we can further categorize them into two types: Gradient-based and Perturbation-based. Gradient-based Attribution. Gradient-based attribution methods estimate importance weights by back-propagated gradients. Based on Taylor expansion, gradients from model output w.r.t input elements reflect its sensitivity towards them, which can be utilized as an importance estimator [15] . For example, importance of node with attribute x for prediction can be computed as ∥ ( x ) ∥ 1 [143] . One weakness of these methods is that local gradients could be unreliable because of gradient saturation problem [173] . Therefore, integrated gradients (IG) [173] that consider the gradients along a path were proposed to address this problem. On graphs, IG score of node attributes can be computed in the form of:

where x ′ represents the baseline attributes which can be set to the global average. Essentially, Eq. (33) integrates the gradients at all points along the path from x ′ to x , instead of relying on gradient at x which may suffer from saturated gradients. Existing works have investigated various gradient-based methods to explain GNNs, such as SA [16] , Guided BP [16] , CAM [143] and Grad-CAM [143] . These methods share similar ideas to identify important input elements. The main difference lies in the procedure of gradient back-propagation and how different hidden feature maps are combined [220] . For example, Guided BP [16] clips negative gradients during conducting back-propagation to estimate contribution weights. CAM [143] requires that a linear layer is used for classification, and calculates heat maps over nodes using node embeddings from the last GNN layer along with weights of that linear classification layer. Grad-CAM [143] generalizes it to the model-agnostic setting by using average class-wise gradients in place of linear classification parameters. It is straightforward to estimate importance weights of node attributes using these methods, and edges connecting important nodes would also be taken as important [60] .

Perturbation-based Attribution. Perturbation-based attribution methods try to learn a perturbation mask and examine prediction variations w.r.t perturbations. Specifically, the mask is optimized to maximize the perturbation (mask out as many edges and nodes as possible) while preserving original predictions. Those left-out unperturbed nodes and edges are taken as the most important elements contributing to the prediction, which corresponds to the explanations. Let M ∈ {0, 1} × and M ∈ {0, 1} × denote binary masks on edges and node attributes, respectively. It can be 

where P denotes the perturbations on original input with provided importance masks, and we useÂ to represent perturbed A. So is the case ofX. With this optimization objective, explanations are found by finding input elements that preserve model predictions. For example, in GNNExplainer [216] , P (A, M ) = A ⊙ M and P (X, M ) = Z + (X − Z) ⊙ M , where Z is sampled from marginal distribution of node attributes F and ⊙ denotes elementwise multiplication. L is usually implemented as cross entropy loss [216] , which encourages consistency on prediction outputs. R regularizes identified explanations and is usually implemented as sparsity constraint. This objective promotes the correctness and non-redundancy of found explanations. Eq.(34) is a discrete optimization problem, which is difficult to solve directly. Various learning paradigms have been proposed to efficiently find the masks [118, 216, 221] . Based on the adopted strategy in learning perturbation masks, these approaches can be further divided into three groups.

The first group identifies effective perturbation masks by conducting Searching [58, 221] . These methods optimize Eq.(34) and learn perturbation masks with the explicit test-and-run paradigm. And the optimization directions are found by search algorithms. For example, SubgraphX [221] employs Monte Carlo Tree Search (MCTS) algorithm to search for the most important subgraph as the explanation of predictions. The Shapley value is used as the measurement of component's importance during the search phase. ZORRO [58] revises the search process by explicitly encoding fidelity into the objective. Causal Screening [195] also falls into this group, which incrementally selects input elements by maximizing individual causal effects at each search step.

The second group uses Attention mechanism to learn perturbation masks [181, 216] . By relaxing binary masks M ∈ {0, 1} × and M ∈ {0, 1} × into soft ones, i.e., M ∈ [0, 1] × and M ∈ [0, 1] × , the soft perturbation masks can be directly optimized in an end-to-end manner. For example, GNNExplainer [216] employs a soft mask on attributes by element-wise multiplication and a mask on edges with Gumbel softmax. Then, these two masks are directly optimized with the objective of minimizing size of unperturbed parts while preserving prediction results.

The last group [118, 196] use an Auxiliary Model to predict effective perturbation masks with the information from graph and target model. The auxiliary model is optimized with Eq.(34) on training samples. And it is assumed to be general and can safely explain new-coming graphs after training. For example, PGExplainer [118] adopts an explanation network to predict preserving probability of each given edge based on node embeddings derived from the target model (G). It can be formally written as M ∼ (G, (G)). Unlike the previous two strategies, the perturbation mask does not need to be re-learned from scratch for each to-be-explained graph, and this group of methods is much faster to use in the test time. Another representative method is GraphMask [155] which trains a model to produce layer-wise edge perturbation masks. It takes node embeddings from the corresponding layer as input, and provides different edge importance masks at different propagation steps. Decomposition Methods. These methods seek to decompose the prediction of target GNN model into contribution of input features. A contribution score would be assigned to each input element, and an explanation is obtained via identifying inputs with the highest scores. Concretely, the influence mass is back-propagated layer-by-layer onto each input element. And the influence mass from input to output will be decomposed based on neural excitation at each layer. During this process, nonlinear components of the GNN model are generally neglected to ease the problem. Popular decomposition strategies on image data include Layer-wise Relevance Propagation (LRP) [14] and Excitation BP [169] . Several efforts have been made to extend them to graphs data [16, 143, 157] .

For example, -rule and -stabilized decomposition rule in original LRP algorithm are extended to work on message passing mechanism of GNN [16] . GNN-LRP [156] evaluates contributions of bag-of-edges and deduces back-propagation rules on graph walks with high-order Taylor decomposition. For example in GCN, as shown in [156] , the back-propagation rule decomposing contribution mass of node at layer + 1 to node at layer can be written as Eq. (35) . This algorithm starts by assigning full contribution mass to the target output, and redistributes it with a backward pass on GNN layer by layer. For simplicity, we use +1, to represent contribution mass of -th dimension of node K's embedding at layer + 1, and use , to denote its decomposed contribution to -th dimension of node J's embedding at layer . Assuming node embedding dimension to be , we have:

where with ℎ , is -th dimension of node embedding at layer , represents the weight linking neuron in layer to neuron in layer + 1 which is a scalar, and is the edge weight connecting node to node . Following this rule, contributions can be back-propagated to the inputs, and those nodes with highest value are preserved as the explanation G . Surrogate Methods. Neural networks are treated as black-box models due to their deep architectures and nonlinear operations. It is observed that they hold a highly-complex loss landscape [104] and are challenging to be explained directly. To circumvent nonlinear classification boundary of the trained DNN model, many attempts are made [148] to approximate DNN's local prediction around each instance x with simple interpretable models such as logistic regression. Specifically, for the target instance x, a group of prediction records can be obtained by applying small random noises to it and collecting the target model's prediction on these perturbed inputs. Then, the interpretable surrogate model can be trained on these records to mimic target model's behavior locally, which serves as explanations. The searching process of interpretable local model ( ) can be written as:

( ) = arg min ′ ∈ ′ L ( , ′ , ) + Ω( ′ ).

where ′ represents candidate interpretable model families and denotes local neighborhood around instance . L ( , ′ , ) measures faithfulness of the surrogate model ′ in approximating the target model in the locality . Ω() measures model complexity to encourage simple surrogate models [148] . Once the surrogate model (x) is trained, explanation on the prediction of on x can be obtained by examining the interpretable function (x).

There are several works extending this idea to explain GNNs [79, 182, 231] . They differ from each other mainly in two aspects: the strategy of obtaining local prediction records ( ), and the interpretable model family ( ′ ) selected as candidate surrogates. For example, GraphLime [79] takes neighboring nodes as perturbed inputs. And it employs a nonlinear surrogate model which can assign large weights to features that are important in inference. However, GraphLime ignores graph structures and can only find important node attributes. PGM-Explainer [182] randomly perturbs node attributes to collect local records, and trains a probabilistic graph model (PGM) to fit them. PGM is interpretable and can show the dependency among nodes inside input graph. RelEx [231] randomly samples subgraphs as inputs, and uses GCN as the surrogate model. It can assign importance weights to edges, but requires an additional running of explanation methods on GCN as it is also non-interpretable.

6.3.2 Model-level Post-hoc Explanation. Compared to instance-level methods, model-level methods focus more on providing general insights by giving high-level explanations that are independent to inputs for deep graph models. The model-level explanations can be representative and discriminative instances for each class [219, 235] or logic rules to depict knowledge captured by deep model [215] . However, due to the highly diverse topology and complex semantics in the graph domain, it is a very challenging task and few attempts are made to provide model-level post-hoc explanations for GNNs. To the best of our knowledge, XGNN [219] is only one work in model-level explanation on graph-structured data. XGNN aims to expose what input graph patterns can trigger certain predictions of the target GNN model, and adopt a graph generation module to achieve that. They employ input optimization methods and train a graph generator to generate graphs that can maximize the target predictions using reinforcement learning. After training, generated graphs are expected to be representative of each class and can provide global knowledge on the captured knowledge of the target GNN model. Concretely, the desired prototypical explanation for class is obtained by solving the following objective:

where denotes the target GNN model, and a graph generator is trained for finding G * for class .

6.3.3 Self-explainable Approaches. Different from the post-hoc explanation, self-explainable approaches [36, 235] aim to give predictions and provide explanations for each prediction simultaneously. Specific GNN architectures are adopted to support built-in interpretablity, similar to attention mechanism in GAT [181] . However, although these methods are explainable by design, they are often restricted in the modeling space and struggle to generalize across tasks at the same time. There are two representative self-explainable GNNs proposed recently, i.e., SE-GNN [36] and ProtGNN [235] . SE-GNN [36] focuses on instance-level self-explanation. It obtains the self-explanation for node classification via identifying interpretable -nearest labeled nodes for each node and utilizes the -nearest labeled nodes to simultaneously give label prediction and explain why such prediction is given. More specifically, SE-GNN [36] adopts an interpretable similarity modeling to compute the attribute similarity and local structure similarity between the target nodes and labeled nodes. A contrastive pretext task is further deployed in SE-GNN to provide self-supervision for interpretable similarity metric learning. ProtGNN [235] is better at global-level explanations by finding several prototypes for each class. Newly-coming instances are classified via comparing with those prototypes in the embedding space. A conditional subgraph sampling module is designed to conduct subgraph-level matching and several regularization terms are used to promote diversity of prototypical embeddings.

For comparing various explanation methods, it is desired to have datasets where the rationale between input graphs and output labels are intuitive and easy-to-obtain, so that it would be easier to evaluate identified explanatory substructures. In this subsection, we summarize popular datasets used by existing works on explainable GNNs, which can roughly be categorized into synthetic datasets and real-world datasets. Several representative benchmarks of both groups will be introduced in detail.

6.4.1 Synthetic Data. With carefully-designed graph generation mechanism, we can constrain unique causal relations between input elements and provided labels in synthetic datasets. GNN models must capture such patterns for a successful training and obtained explanations are evaluated with those ground-truth causal substructures. Several common synthetic datasets are listed below:

• BA-Shapes [216] : It is a single graph consisting of a base Barabasi-Albert (BA) graph (contains 300 nodes) and 80 "house"-structured motifs (contains 5 nodes). "House" motifs are randomly [235] attached to the base BA graph. Nodes in the base graph are labeled as 0 and those in the motifs are labeled as 1, 2, and 3 based on their positions. Explanations are evaluated on attached nodes of motifs, with edges inside the corresponding motif as ground-truth. • Tree-Cycles [216] : It is a single network with a 8-layer balanced binary tree as the base graph.

80 cycle motifs (contains 6 nodes) are randomly attached to the base graph. Nodes in the base graph are labeled as 0 and those in the motifs are labeled as 1. Ground-truth explanations for nodes within cycle motifs are provided for evaluation. • BA-2motifs [216] : This is a graph classification dataset containing 800 graphs. Half of the graphs are constructed by attaching a "house" motif to BA base graphs, while the other half graphs attach a five-node cycle motif. A binary label is assigned to each graph according to its attached motif. The motif serves as the ground-truth explanation. • Infection [52] : This is a single network initialized with a ER random graph. 5% of nodes are labeled as infected (class 0), and the remaining nodes are labeled as their shortest distances to those infected ones. In evaluation, nodes with multiple shortest paths are discarded. All remaining nodes have one distinct path as the oracle explanation towards their labels. • Syn-Cora [36] : This is synthesized from the Cora [96] to provide ground-truth of explanations, i.e., -nearest labeled nodes and edge machining results. To construct the graph, motifs are obtained by sampling local graphs of nodes from Cora. Various levels of noises are applied to the motifs in attributes and structures to generate similar local graphs.

6.4.2 Real-world Data. Due to high complexity in patterns and possible existence of noises, it is challenging to obtain human-understandable rationale from node features and graph topology to labels for real-world graphs. Typically, strong domain knowledge is needed. Thus, real-world graph datasets with groundtruth explanations are limited. Below are two benchmark datasets:

• Molecule Data [202] : This is a graph classification dataset. Each graph corresponds to a molecule with nodes representing atoms and edges for chemical bonds. Molecules are labeled with consideration of their chemical properties, and discriminative chemical groups are identified using prior domain knowledge. Chemical groups 2 and 2 are used as ground-truth explanations. • Sentiment Graphs [220] : It contains three graph classification datasets created from text datasets for sentiment analysis, i.e., Graph-SST3, Graph-SST5 and Graph-Twitter. Each graph is a text document where nodes represent words and edges represent relationships between word pairs constructed from parsing trees. Node attributes are set as word embeddings from BERT [42] . There is no ground-truth explanation provided. Heuristic metrics are usually adopted for evaluation.

A summary of representative benchmark datasets is provided in Table 12 along with their key statistics, tasks and papers used the datasets.

Besides visualizing identified explanations and conducting expert examinations, several metrics have been proposed for quantitatively evaluating explanation approaches from different perspectives. Next, we would present the major categories of metrics used and introduce their distinctions.

• Explanation Accuracy. For graphs with ground-truth rationale known, one direct evaluation method is to compare identified explanatory parts with the real causes of the label [118, 216] . F1 score and ROC-AUC score can both be computed on identified edges. The higher these scores are, the more accurate obtained explanation is. • Explanation Fidelity. In the absence of ground-truth explanations, heuristic metrics can be designed to measure the fidelity of identified substructures [220] . The basic idea is that explanatory substructures should play a more important role in predictions. Concretely, Fidelity+ [220] is computed by removing all input elements first, then gradually adding features with the highest explanation scores. Heuristically, a faster increase in GNN's prediction indicates stronger fidelity of obtained explanations. In turn, Fidelity- [220] is computed by sequentially removing edges following assigned importance weight. In turn, a faster performance drop represents stronger fidelity of removed explanations. • Sparsity. Good explanations are expected to be minimal structure explanations, as only the most important input features should be identified. This criteria directly measures sparsity of obtained explanation weights [59] , and a better explanation should be sparser. • Explanation Stability. As good explanations should capture the intrinsic rationale between input graphs and their labels, this criterion requires identified explanations to be stable with respect to small perturbations [4] . Stability score can be computed via comparing explanation changes after attaching new nodes or edges to the original graph. However, GNN's prediction is sensitive towards input. Thus, it is challenging to select a proper amount of perturbations.

In this part, we provide our opinions on some promising future research directions. We hope it can inspire and encourage the community to work on bridging the gaps in GNN interpretation. Class-level Explanations. Despite many interpretation approaches, class-wise explanations remain an under-explored area. Instance-level explanations provide a local view of GNN's prediction. However, it is important to recognize that they may provide anecdotal evidence and scale poorly to large graph sets. On the other hand, class explanations could provide users with both a global view of model's behavior and a discriminative view grounded in each class, making it easier to expose and evaluate learned knowledge. Benchmark Datasets for Interpretability. One great obstacle in designing and evaluating interpretation methods is the measurement of provided interpretability. To a large extent, the difficulty is inherent. For example, it is impossible to provide gold labels for what is a correct explanation. After all, we would not need to design an explanation approach in the first place had we known those real explanations. Currently, we rely upon heuristic metrics and approximations, but a set of principled proxy measurements and benchmark datasets are yet to be established.

User-Oriented Explanations. The purpose of designing explanation approaches is to use them on real-world tasks to expose learned knowledge of trained models. Based on the needs of users, different requirements may arise in the form of explanation and levels of interpretability. Hence, we encourage researchers to consider real user cases, select suitable design choices, and evaluate explanation algorithms in real-world scenarios for the integrity in interpretability field. One promising direction is to provide flexible fine-grained multi-level explanations so that end-users can select the level of explanations they can understand or satisfy their criteria.

In this survey, we conduct a comprehensive review on the trustworthy GNNs from the aspects of privacy, robustness, fairness, and explainability. This fills the gap in lacking systematically summary about privacy-preserving GNNs and fairness-aware GNNs. For the robustness and explainability, we introduce the recent trends in more details apart from the representative methods that are reviewed before. More specifically, for each aspect, we introduce the core definitions and concepts to help the readers to understand the defined problems. The introduced methods are categorized from various perspectives. The unified framework of each category is generally given followed by the detailed implementations of the representative methods. In addition, we also list the used datasets in privacy, fairness, and explainability, where the proposed methods have special requirements on the datasets to be trained or evaluated. Numerical real-world applications of trustworthy GNNs are also provided to encourage the researcher to develop practical trustworthy GNNs. Finally, we discuss the future research directions of each aspect at the end of each section, which includes promising directions in a single aspect and interactions between aspects for trustworthy GNNs.

Deep learning with differential privacy

The political blogosphere and the 2004 us election: divided they blog

Sanity checks for saliency maps

Towards a unified framework for fair and stable graph representation learning

Graph-based deep learning for medical diagnosis and analysis: past, present and future

Privacy-preserving machine learning: Threats and solutions

On the robustness of interpretability methods

Local differential privacy for deep learning

Wasserstein generative adversarial networks

A survey on graph neural networks for knowledge graph completion

Uci machine learning repository

A diagnostic study of explainability techniques for text classification

On the efficiency of the information networks in social media

On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

How to explain individual classification decisions

Explainability techniques for graph convolutional networks

Responsible machine learning with interactive explainability and fairness in python

Data decisions and theoretical implications when adversarially learning fair representations

Adversarial attacks on node embeddings via graph poisoning

A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy, Robustness, Fairness, and Explainability 45 Conference on Machine Learning

Certifiable robustness to graph perturbations

Compositional fairness constraints for graph embeddings

Debayes: a bayesian method for debiasing network embeddings

Measuring user influence in twitter: The million follower fallacy

A restricted black-box adversarial framework towards attacking graph embedding models

Fastgcn: fast learning with graph convolutional networks via importance sampling

Link prediction adversarial attack

Can adversarial network attack be defended?

Fast gradient attack on network embedding

A survey of adversarial learning on graphs

Understanding structural vulnerability in graph convolutional networks

Iterative deep graph learning for graph neural networks: Better and robust node embeddings

Supervised community detection with line graph neural networks

Risk assessment for networked-guarantee loans using high-order graph attention representation

Towards robust graph neural networks for noisy graphs with sparse labels

Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information

Towards self-explainable graph neural network

Adversarial attack on graph structured data

Adversarial training methods for network embedding

A survey of the state of explainable ai for natural language processing

Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity

Batch virtual adversarial training for graph convolutional networks

Pre-training of deep bidirectional transformers for language understanding

Individual fairness for graph neural networks: A ranking based approach

Edits: Modeling and mitigating data bias for graph neural networks

Do the young live in a "smaller world" than the old? age-specific degrees of separation in a large-scale mobile communication network

Quantifying privacy leakage in graph embedding

Fairness through awareness

Calibrating noise to sensitivity in private data analysis

The algorithmic foundations of differential privacy

Censoring representations with an adversary

All you need is low (rank) defending against adversarial attacks on graphs

When comparing to ground truth is wrong: On evaluating gnn explanation methods

Jointly attacking graph neural network and its explanations

Graph neural networks for social recommendation

Graph adversarial training: Dynamically regularizing based on graph structure

Adversarial graph contrastive learning with information regularization

Hard masking for explaining graph neural networks

Hard masking for explaining graph neural networks

Valid, sparse, and stable explanations in graph neural networks

Learning to explain graph neural networks

Applications of community detection techniques to brain graphs: Algorithmic considerations and implications for neural function

Robustness of graph neural networks at scale

Neural message passing for quantum chemistry

Community structure in social and biological networks

Generative adversarial nets

Explaining and harnessing adversarial examples

Learning robust representation through graph adversarial contrastive learning

Inductive representation learning on large graphs

Equality of opportunity in supervised learning

Explainable predictive business process monitoring using gated graph neural networks

The movielens datasets: History and context

Fairness without demographics in repeated loss minimization

Serverless multi-task federated learning for graph neural networks

Stealing links from graph neural networks

Node-level membership inference attacks against graph neural networks

Strategies for pre-training graph neural networks

Gpt-gnn: Generative pre-training of graph neural networks

Heterogeneous graph transformer

Local interpretable model explanations for graph neural networks

An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models

Evaluating differentially private machine learning in practice

Differential privacy and machine learning: a survey and review

Could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models

Certified robustness of graph convolution networks for graph classification under topological attacks

Power up! robust graph convolutional network against evasion attacks based on graph powering

Node similarity preserving graph convolutional networks

Adversarial attacks and defenses on graphs: A review, a tool and empirical studies

Graph structure learning for robust graph neural networks

The effect of race/ethnicity on sentencing: Examining sentence type, jail length, and prison length

Advances and open problems in federated learning

Inform: Individual fairness on graph mining

What can we learn privately?

Brainnetcnn: Convolutional neural networks for brain networks; towards predicting neurodevelopment

Fairness-enhanced node representation learning

How to find your friendly neighborhood: Graph attention design with self-supervision

Semi-supervised classification with graph convolutional networks

Predict then propagate: Graph neural networks meet personalized pagerank

Federated learning: Strategies for improving communication efficiency

Fairness-aware node representation learning

Too much, too little, or just right? ways explanations impact end users' mental models

Fairness without demographics through adversarially reweighted learning

Learning to discover social circles in ego networks

Visualizing the loss landscape of neural nets

Adversarial attack on large scale graph

Adversarial privacy-preserving graph embedding against inference attack

On dyadic fairness: Exploring and mitigating bias in graph connections

Interpretable brain graph neural network for fmri analysis

Graph neural network-based diagnosis prediction

Unified robust training for graph neural networks against label noise

Credit risk and limits forecasting in e-commerce consumer lending service via multi-view-aware mixture-of-experts nets

Learning generative adversarial representations (gap) under fairness and censoring constraints

Information obfuscation of graph neural networks

Trustworthy ai: A computational perspective

Federated social recommendation with graph neural network

On the fairness of disentangled representations

Pre-training graph neural networks for link prediction in biomedical networks

Learning to drop: Robust graph neural network via topological denoising

Auto-encoder based graph convolutional networks for online financial anti-fraud

Towards more practical adversarial attacks on graph neural networks

Graph adversarial attack via rewiring

Learning adversarially fair and transferable representations

Bursting the filter bubble: Fairness-aware network link prediction

Image labeling on a network: using social-network metadata for image classification

Learning to discover social circles in ego networks

Communication-efficient learning of deep networks from decentralized data

A survey on bias and fairness in machine learning

Exacerbating algorithmic bias through fairness attacks

Explanation in artificial intelligence: Insights from the social sciences

Explaining explanations in ai

A collection of benchmark datasets for learning with graphs

Definitions, methods, and applications in interpretable machine learning

From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai

A dual heterogeneous graph attention network to improve long-tail performance for shop search in e-commerce

Releasing graph neural networks with differential privacy guarantees

Membership inference attack on graph neural networks

Social data: Biases, methodological pitfalls, and ethical boundaries

Debiasing graph embeddings via the metadata-orthogonal training unit

Tri-party deep network representation

Semi-supervised knowledge transfer for deep learning from private training data

Decentralized federated graph neural networks

Explainability methods for graph convolutional neural networks

Graph contrastive coding for graph neural network pre-training

Membership inference attack against differentially private deep learning model

Towards fair graph embedding

Explainable fraud transaction detection on heterogeneous graphs

why should i trust you?" explaining the predictions of any classifier

A survey of privacy attacks in machine learning

Perturbation-based explanations of prediction models. In Human and machine learning

Self-supervised graph transformer on large-scale molecular data

Locally private graph neural networks

Graph neural networks for friend ranking in large-scale social platforms

Modeling relational data with graph convolutional networks

Interpreting graph neural networks for nlp with differentiable edge masking

Higher-order explanations of graph neural networks via relevant walks

Layerwise relevance visualization in convolutional text graph classifiers

Collective classification in network data

A common framework to provide explanations and analyse the fairness and robustness of black-box models

Overlapping community detection with graph neural networks

Pitfalls of graph neural network evaluation

Model stealing attacks against inductive graph neural networks

Membership inference attacks against machine learning models

Fairness in algorithmic decision-making: Applications in multi-winner voting, machine learning, and recommender systems

Ethics guidelines for trustworthy ai

Introduction to stochastic actor-based models for network dynamics

Poisoning attacks on algorithmic fairness

Biased edge dropout for enhancing fairness in graph representation learning

Striving for simplicity: The all convolutional net

Algorithmic glass ceiling in social networks: The effects of social recommendations on network diversity

Adversarial attack and defense on graph data: A survey

Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach

Axiomatic attribution for deep networks

A framework for understanding unintended consequences of machine learning

Deep representation learning for social network analysis

Adversarial attack on hierarchical graph pooling neural networks

Transferring robustness for graph neural network against poisoning attacks

Investigating and mitigating degree-related biases in graph convoltuional networks

Single node injection attack against graph neural networks

Poisoning attacks on fair machine learning

Graph attention networks

Probabilistic graphical model explanations for graph neural networks

Privacy-preserving representation learning on graphs: A mutual information perspective

Certified robustness of graph neural networks against adversarial structural perturbation

Graphfl: A federated learning framework for semi-supervised node classification on graphs

Deep knowledge-aware network for news recommendation

Robust unsupervised graph representation learning via mutual information maximization

Scalable attack on graph data by injecting vicious nodes

A review on graph neural network methods in financial applications

Learning robust representations with graph denoising policy network

Unbiased graph embedding with biased graph observations

Heterogeneous graph attention network

Unsupervised learning for community detection in attributed networks based on graph convolutional network

Towards robust graph convolutional networks

Causal screening to interpret graph neural networks

Towards multi-grained explainability for graph neural networks

Recent advances in reliable deep graph learning: Adversarial attack, inherent noise, and distribution shift

Model extraction attacks on graph neural networks: Taxonomy and realization

Adapting membership inference attacks to gnn for graph classification: Approaches and implications

Federated graph neural network for privacy-preserving recommendation

Adversarial examples for graph data: deep insights into attack and defense

Moleculenet: a benchmark for molecular machine learning

30th USENIX Security Symposium (USENIX Security 21) (2021)

Learning how to propagate messages in graph neural networks

Federated graph classification over non-iid graphs

Looking deeper into deep learning model: Attribution-based explanations of textcnn

Towards consumer loan fraud detection: Graph neural networks with role-constrained conditional random field

Differentially private network embedding

More is better (mostly): On the backdoor attacks in federated graph neural Enyan Dai

Unsupervised adversarially-robust representation learning on graphs

Topology attack and defense for graph neural networks: An optimization perspective

Graph-based prediction of protein-protein interactions with attributed signed graph embedding

Local differential privacy and its applications: A comprehensive survey

Federated machine learning: Concept and applications

Learn to explain efficiently via neural logic inductive learning

Generating explanations for graph neural networks

Graph convolutional neural networks for web-scale recommender systems

Graph contrastive learning with augmentations

Towards model-level explanations of graph neural networks

Explainability in graph neural networks: A taxonomic survey

On explainability of graph neural networks via subgraph explorations

Fair representation learning for heterogeneous information networks

Data poisoning attack against knowledge graph embedding

Adversarial label-flipping attack and defense for graph neural networks

Robust heterogeneous graph neural networks against adversarial attacks

Visual interpretability for deep learning: a survey

Graph embedding matrix sharing with differential privacy

Graph embedding for recommendation against attribute inference attacks

Attributed graph clustering via adaptive graph convolution

Defending graph neural networks against adversarial attacks

Relex: A model-agnostic relational model explainer

Inference attacks against graph neural networks

Backdoor attacks to graph neural networks

Extracting private graph data from graph neural networks

Protgnn: Towards self-explaining graph neural networks

You can still achieve fairness without sensitive attributes: Exploring biases in non-sensitive features

Imbalanced node classification on graphs with graph neural networks

Exploring edge disentanglement for node classification

Asfgnn: Automated separated-federated graph neural network. Peer-to-Peer Networking and Applications

Vertically federated graph neural network for privacy-preserving node classification

Graph neural networks: A review of methods and applications

Deep graph structure learning for robust representations: A survey

Adversarial attacks on neural networks for graph data

Adversarial attacks on graph neural networks via meta learning

Certifiable robustness and robust training for graph convolutional networks

Certifiable robustness of graph convolutional networks under structure perturbations