key: cord-0225175-rw4ylbd4 authors: Ding, Kexin; Zhou, Mu; Wang, Zichen; Liu, Qiao; Arnold, Corey W.; Zhang, Shaoting; Metaxas, Dimitri N. title: Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications date: 2022-02-17 journal: nan DOI: nan sha: c8a018d72af8b1a972694a891c7ac0c13810c380 doc_id: 225175 cord_uid: rw4ylbd4 Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research. Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The recent surge of graph convolutional networks (GCNs) provides opportunities to address this information complexity via graph-driven architectures. GCNs have demonstrated their capability to perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These advances of GCNs have prompted a new wave of research and application in medical imaging analysis with an overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we share rapid developments of GCNs in the context of medical image analysis including radiology, histopathology, and other related imaging modalities. We discuss the fast-growing synergy of graph network architectures and medical imaging components to advance our assessment of disease status and outcome in clinical tasks. To provide a guideline to foster cross-disciplinary research, we present emerging opportunities and identify common challenges in imagebased GCNs and their extensions in model interpretations, technical advancements, large-scale benchmarks to transform the scope of medical image studies and related graph-driven medical research. Graph representation has been broadly studied in information extraction, relational representation, and multi-modality data fusion (Zhou et al., 2020a; Bruna et al., 2013; Wu et al., 2020) . The rich topological and spatial characteristics of graphs essentially uncover differential relations among individual graph elements (Zhang et al., 2019) . In medical image analysis, the diverse shape, anatomy, and appearance information provide a key data source to characterize the interactions among the diagnostic region of interests (ROIs) and reveal disease status (Failmezger et al., 2020) . Therefore, image-based graph modeling and inference can deepen our understanding of the complex relational patterns hidden in disease tissue regions. The surge of graph convolutional networks (GCNs), a branch of deep learning characterized by graph-level model development, has brought a new wave of information fusion techniques through their widespread applications in medical imaging, from disease classification (Adnan et al.) , tumor segmentation (Anklin et al., 2021) , to patient outcome prediction (Raju et al.) . Multi-modality medical imaging and other nonimage data can be jointly considered for GCN modeling and analysis. (b) Graph representation learning. The image-graph transformation pipeline includes node selection, node attribute extraction, and edge construction. For different types of medical images, we aim to design a variety of task-specific transformation strategies. (c) Graph convolutional networks framework. The input of GCNs is the constructed data-rich graphs based on image contents. The GCNs architecture contains input, hidden, and output layers to allow information extraction and inference. (d) Clinical tasks. We review a broad range of tasks with clinical relevance that incorporate disease detection, segmentation, and outcome prediction. Conceptually, GCNs could broadly fall into two categories including spectral-based and spatial-based GCNs. First, the spectralbased graph convolutions are defined in the spectral domain based on the graph Fourier transformation , which can be regarded as an analogy of the signal Fourier transform in 1-D space. Second, the spatial-based graph convolutions are defined in the spatial domain that the aggregations of node representations come from the collective information of neighboring nodes. Also, we discuss important graph pooling modules as downsampling strategies to reduce the size of graph representation (Zhou et al., 2020a) , which can critically alleviate issues of overfitting, permutation invariance, and computational complexity in the development of graph neural networks. In later sections, we define a graph as G = (V, E), where V is the graph node and E is the edge between nodes. For graph representation learning, we use H to donate the hidden state vector of nodes. Spectral-based graph convolutional networks are derived from the field of graph signal processing, where the spectral-based convolutional operators are defined in the spectral domain (Zhou et al., 2020a) . Theoretically, a graph signal x will be transformed to the spectral domain by a graph Fourier transform F before the convolution operation. In this way, the spectral-based graph convolutions can be computed by taking the inverse Fourier transform of the multiplication between two Fourier transformed graph signals (Failmezger et al., 2020) . Then the resulting signal is transformed back by the inverse graph Fourier transform F −1 . These transformations are defined as: U is the matrix of eigenvectors of the normalized graph Laplacian matrix L = I N − D − 1 2 AD − 1 2 , where I N is the normalized identity matrix, D is a node degree matrix and A is the adjacency matrix, which represents the connectivity between every two nodes. L has the property of being real symmetric positive semidefinite. With this property, the normalized Laplacian matrix can be factorized as L = UΛU T , where Λis a diagonal matrix of all the eigenvalues. According to the graph fourier transformation, the input graph signal x with a filter g ∈ R n is defined as: where denotes the element-wise product, U T g is a filter in the spectral domain. If we simplify the filter by a learnable diagonal matrix g θ = diag(U T g), then the spectral graph convolution can be simplified as: The majority of spectral-based graph convolutional networks are based on the above definitions, and the design of filter g θ determines the various performance of individual approaches. Normally, the spectral-based graph convolutional network designs the convolution operation in the Fourier domain by computing the eigen-decomposition of the graph Laplacian . They assume that the filter g θ = Θ (k) i, j is a set of learnable parameters and considers graph signals with multiple channels. Due to the eigen-decomposition of the Laplacian matrix, any perturbation to a graph can result in changes of eigenbasis . The learned filters are domain dependent with a poor graph structure generalization. Also, eigen-decomposition has a high computational complexity that is unfavorable for large-scale data processing. To overcome the limitations, especially the computational complexity, Chebyshev spectral CNN (ChebNet) (Defferrard et al., 2016) used K-polynomial filters to achieve a good localization in the vertex domain by integrating the node features within the K-hop neighborhood, i.e., g θ = k i=0 θ i T iL , wherē L = 2 λ max L = I N , λ max denotes the largest eigenvalue of L. The range of the eigenvalues inL is [−1, 1]. The Chebyshev polynomials are defined recursively as T i (x) = 2xT i−1 (x) − T i−2 (x) with T 0 (x) = 1 and T 1 (x) = x. The convolution operation can be written as: For a similar purpose of improving computational efficiency, CayleyNet (Levie et al., 2018) applies the Cayley polynomials that are parametric rational functions to capture narrow frequency bands. The spectral graph convolution operation is defined as: Where Re(· ) returns the real part of a complex number, c 0 is a real coefficient, c j is a complex coefficient, i is the imaginary number, and h is the parameter that controls the spectrum of a Cayley filter. ChebNet could be regarded as a special case of CayleyNet via the use of the Chebyshev polynomial approximation to reduce the computational complexity. A notable variant of ChebNet for further simplifying the computational complexity, which truncates the Chebyshev polynomial to the first-order approximation that the central node only considers its 1-hop neighboring nodes (Kipf and Welling, 2016) . The approach simply filters in (5) with i = 1 and λ max = 2 to alleviate the problem of overfitting: To restrain the number of parameters and avoid overfitting, GCN further assumes that θ = θ 0 = θ 1 so that g θ = θ(I N + D − 1 2 AD − 1 2 ). To solve the exploding or vanishing gradient problem in (7): propagation layer of GCN is defined as: where X ∈ R N×F is the input matrix, Θ ∈ R N×F is the parameter and H ∈ R N×F is the output matrix. F and F are the dimensions of the input and the output, respectively. Recent research findings demonstrate the improvement of GCN's feasibility and consistency on graph models. The adaptive graph convolution network (AGCN) (Li et al., c) could construct and learn a residual graph Laplacian matrix for each sample in the batch through a learnable distance function that takes two nodes' features as inputs. The residual graph Laplacian matrix leads to achieving high-level performance in public graph-structured datasets. In addition, the dual graph convolutional network (DGCN) (Zhuang and Ma) explores the perspective of augmenting the graph Laplacian as AGCN (Li et al., c) . DGCN jointly considers the local consistency and global consistency on graphs through two convolutional networks. The first convolutional network is the same as (8), while the second network replaces the adjacency matrix with the positive pointwise mutual information (PPMI) matrix. Spectral-based graph convolutional networks have a solid theoretical foundation derived from graph signals theories. Despite efforts to overcome the computation complexity, the generalization power of spectral-based GCNs is limited as opposed to the broad usage of spatial-based approaches below. Currently, the spectral-based methods train the filters on the fixed graph structure, making the trained filters unable to apply to a new graph with different structures. However, the graph structures can dramatically vary in both size and connectivity in practical applications (Li et al., c) . The generalization power across different tasks and the high computation complexity become the major hurdles to developing spectral-based graph convolutional networks. The spatial graph convolutional operation essentially focuses on aggregating and updating node representation by propagating node information along edges . The aggregation strategy can directly improve the generalization power of dealing with different structured graphs by aggregating the information from neighboring nodes and updating the center node representation. The message-passing neural network (MPNN) (Gilmer et al.) represents a general framework of spatial-based GCNs . The key forward propagation strategy of MPNN is passing the information between nodes through edges directly. As defined in the propagation function below, MPNN runs T steps message-passing iterations so that the information could be propagated between nodes. Notably, GraphSAGE (Hamilton et al.) is a general inductive framework which generates embeddings by sampling and aggregating features from a node's local neighborhood. GraphSAGE leverages node feature information to efficiently generate node embeddings for previously unseen data (Hamilton et al.) . The propagation rule follows: Where AGGREGATE is an aggregator function that could aggregate information from node neighbors. Three types of aggregators are utilized in GraphSAGE, including mean aggregator, LSTM aggregator, and pooling aggregator. W k is a set of weight matrices that are used to propagate information from different layers. CONCAT is the concatenated operation. Interestingly, Graph-SAGE with a mean aggregator can be considered as an inductive version of GCN. To further identify the graph structures that cannot be distinguished by GraphSAGE (Xu et al., 2018) , Graph Isomorphism Network (GIN) (Veličković et al., 2017 ) is a maximally powerful architecture to distinguish the isomorphism graph. As proved in GIN (Veličković et al., 2017) , the injective aggregation update maps node neighborhoods to different feature vectors so that the isomorphism graph can be distinguished. To achieve the injectivity of the AGGREGATE, sum-pooling is applied in GIN. The AGGREGATE and COMBINE steps are integrated as follows: MLP is a multi-layer perceptron that could represent the composition of functions. The attention mechanism has been increasingly applied in spatial-based GCNs models for various sequence-based approaches (Veličković et al., 2017; Zhou et al., 2020a) . Several key works are attempting to utilize attention mechanisms on graphs. Different from the design of spectral and spatial convolutional operations, the attention-based convolutional operations assign different weights for neighbors to stabilize the learning process and thus alleviate noise effects. A benefit of attention mechanisms is that they allow for dealing with variable-sized inputs, and focusing on the most relevant parts of the input to make decisions (Veličković et al., 2017) . Graph Attention Network (GAT) (Veličković et al., 2017) proposes a computationally efficient graph attentional layer which leverages self-attention and multi-head attention mechanisms. The GAT layer is parallelizable across all nodes in the entire graph while allowing for assigning different importance weights to different (degree) nodes in different size neighborhoods, and does not depend on knowing the entire graph structure. The coefficients computed by the attention mechanism and the propagation of GAT is formulated as: where α and and W are weight vectors, and || is the concatenation operation. Furthermore, GAT leverages multi-head attention (Vaswani et al.) to stabilize the learning process of self-attention (13), which can be written as: where α k i j are normalized attention coefficients computed by the k-th attention mechanism. GAT achieved significant improvement in both transductive tasks and inductive tasks, especially in the inductive task (e.g., protein-protein interaction dataset), GAT improved the micro-averaged F1 scores by 20.5% compared to the best GraphSAGE result. In summary, spatial-based convolutional graph operations follow a neighborhood aggregation strategy, where we can iteratively update the representation of a node by aggregating representations of its neighbors. After k iterations of aggregation, a node's representation captures the structural information within its k-hop network neighborhood. The rapid development of spatial-based GCNs has displayed their computational efficiency, graph-structure flexibility, and potential generalization across tasks while compared with spectral-based GCNs . First, spatial-based GCNs tend to be more efficient than spectral-based GCNs because they directly perform convolutions in the graph domain via node information propagation. Thus spatial-based GCNs do not have to perform eigenvector computation or handle the whole graph computation simultaneously. Second, spatial-based models are flexible to handle multi-sourced graph inputs via the convenient aggregation function . These graph inputs can be prepared as edge inputs (Scarselli et al., 2008; Kearnes et al., 2016; Pham et al.; Simonovsky and Komodakis), directed graphs (Atwood and Towsley; Li et al., 2017) , signed graphs (Derr et al.) , and heterogeneous graphs Wang et al., b) . Third, spatial-based models perform graph convolutions locally on each node where network weights can be efficiently generalized across different nodes and graph structures. Therefore, spatial-based models have been shown to achieve superior performance on both transductive (e.g., semi-supervised learning) and inductive (e.g., the traditional supervised learning) tasks with flexibility on graph structures. Graph pooling is a key strategy to address the computational challenges derived from graph convolutional operations (Mesquita et al., 2020) . Pooling operations reduce the size of a graph representation while preserving valuable structural information. Typically, graph pooling layers are located after graph convolutional layers and work as a down-sampling strategy. Graph pooling can be categorized into global and hierarchical graph poolings as shown in Fig. 2 . Global pooling operation aggregates the node representations via simple flattening procedures such as summing, averaging, or maxing the node embeddings that are widely used in graph classification tasks (Mesquita et al., 2020) . Further, a global sorting pooling (Zhang et al., a) sorts the node features in a descending order based on their last feature channel and the k-largest nodes form the updated graph representation of the global sorting pooling layer. Also, global attention pooling (Li et al., 2015) acts as a soft attention mechanism that decides relevant nodes to the current graph-level task. Such global-wise pooling strategies, also known as readout layers, are often used to generate graph-level representation based on the previous node representations. Hierarchical pooling operation is designed to refine the node representation by down-sampling strategies and overcome model overfitting. Hierarchical pooling strategies could be further categorized into two types including clustering-based and sorting-based methods. In clustering methods, spectral clustering (SC) offers an efficient means to find strongly-connected communities on a graph. SC can be used in GCNs to implement pooling operations that aggregate nodes especially belonging to the same cluster (Bianchi et al., 2019) . However, the expense of eigen-decomposition of the Laplacian and the generalization of SC strategies remain yet to be explicitly addressed. Alternatively, a graph clustering approach (Bianchi et al., 2019) formulates a continuous relaxation of the normalized min-cut problem and trains GCNs to compute cluster assignments. Spatial-based clustering strategies are (a) Global graph pooling. The function of global graph pooling is to flatten the node representations to a graph representation. In node representation, each node will include multiple dimensions of node attributes. After utilizing the global graph pooling, the most representative feature will be selected as the node attribute in graph representation. (b) Clustering graph pooling. Clustering-based poolings offer an efficient means to find strongly-connected communities on a graph. The nodes in the same clusters are represented by a new cluster node representation. (c) Sorting graph pooling. Sorting-based pooling updates the node representation by sorting the nodes attributes or edges weights. Both (b) and (c) are hierarchical pooling operations that refine the node representation to gain model robustness and improve computation efficiency. proposed to achieve a higher computation efficiency compared with spectral-based clustering strategies. For example, DIFFPooling (Ying et al., 2018 ) is a differentiable graph pooling strategy that can generate hierarchical representations of graphs and can be combined with various GCNs architectures in an end-to-end fashion (Ying et al., 2018) . The key design of DIFFPooling is to learn a differentiable soft cluster assignment for nodes at each GCNs layer and mapping nodes to a set of clusters, which then forms the coarsened input for the next GCNs layer. In sorting-based methods, they focus on updating the node representation by sorting the nodes and edges depending on their attributes or weights. TopKPooling and SAGPooling shared a similar idea on the node sorting by their attention scores (Cangea et al., 2018; Gao and Ji; Knyazev et al., 2019; Lee et al.) . These poolings are designed to help select the top-kth nodes to summarize the entire graph for further feature computations. Notably, TopKPooling and SAGPooling can drop the node during model training to improve the computation efficiency and thus overcome the model overfitting. From the graph edge perspective, EdgePooling Diehl, 2019) is an inspiring example that could drop edges and merge nodes by sorting all edge scores and successively choosing the useful edges with the highest score whose two nodes have not yet been part of a contracted edge. To optimize the performance of graph network models, there are multiple trade-offs between the network architecture and the corresponding model performance. The ability of information collection and the strategy of effective aggregation are crucial factors for measuring the performance of GCNs models. Intuitively, a deeper architecture corresponds to a larger receptive field, which can collect more auxiliary information towards enhanced performance of GCNs. However, the performance might decrease when layers go deeper to evolve larger receptive fields in real applications (Liu et al., c) . Such performance deterioration could be attributed to the over-smoothing of node representation with an increased architecture depth. In other words, the repeated and mixed message aggregation can lead to node representations of inter-classes indistinguishable (Chen et al., a) . It is commonly seen that the oversmoothing issue always occurs in the nodes with a dense connection with other nodes (e.g., the core of the graph) that could rapidly aggregate information in the entire graph. In contrast, the node in the tree part (e.g., leaves of the tree) could only include a very small fraction of information of all nodes with a small number of GCNs layers. To improve the GCNs model performance, it is necessary to overcome the graph over-smoothing phenomena and achieve informative node representation. For example, the study (Li et al., b) implemented a co-training and self-training scheme with a smoothness regularizer term and adaptive edge optimization (Chen et al., a) to alleviate the over-smoothing problem. Co-training a GCN with the random walk model can explore the global graph topology. Further, self-training a GCN could exploit feature extraction capability to overcome its localized limitation. Informative node representation via the jumping knowledge network (JK-Net) (Xu et al.) tends to demonstrate compelling performance on graph computing efficiency and alleviate overfitting. Notably, the idea of layer-aggregation across layers helps select the most informative nodes and reduce the overfitting issue, and the LSTM-attention could further identify the useful neighborhood ranges. Inspired by the architecture of JK-Net, Deep adaptive graph neural network (DAGCNs) (Liu et al., c) developed an adaptive score calculation scheme for each layer, which could balance the information from both local and global neighborhoods for each node. Both JK-Net and DAGCNs aim to find a trade-off between accuracy performance and the size of receptive fields by adaptively adjusting the information from local and global neighborhoods. For the design of network architecture, we expect additional efforts to overcome the over-fitting issues while keeping a flexible architecture to explore more meaningful information in the context of disease detection and diagnosis. Over the past decades, multi-modality radiological images have been routinely utilized in abnormality segmentation (Ibragimov and Xing, 2017; Havaei et al., 2017) , detection (Chilamkurthy et al., 2018; Grewal et al.) , and patient outcome classification (Lakshmanaprabu et al., 2019; Frid-Adar et al.) . In this section, we discuss the growing body of GCNs studies applied to radiological analysis (Ktena et al., 2018; Zhang et al., c; Burwinkel et al.) , including magnetic resonance imaging (MRI), Computed Tomography (CT), and X-ray imaging. The combination of GCNs and radiological imaging promises to reflect the interaction among tissue regions and provide an intuitive means to fuse the morphological and topological-structured features among key image regions to advance modeling, interpretation, and outcome prediction. We here discuss the representative neuroimaging research and other related studies to highlight the usefulness of GCNs across different radiological imaging modalities and clinical tasks. In neuroimaging, multi-modality MRI is a useful diagnostic technique by providing high-quality three-dimensional (3D) images of brain structures with detailed structural information (Kong et al., 2019) . Conceptually, multi-modality MRI data can be categorized into functional MRI (fMRI), structural MRI (sMRI), and diffusion MRI (DMRI). The fMRI measures brain activity and detects the changes in blood oxygenation and blood flow in response to neural activity (Logothetis et al., 2001) . The sMRI translates the local differences in water content into different shades of gray that serve to outline the shapes and sizes of the brain's various subregions (Seeman and Madras, 2013) . The DMRI is a magnetic resonance imaging technique in which the contrast mechanism is determined by the microscopic mobility of water molecules (Narayan, 2018). All these imaging modalities provide vital diagnostic support for neurological disease analysis because they can capture anatomical, structural, and diagnosis-informative features in neurology. Therefore, the overarching goal is to develop useful graph network models to define, explore, and interpret interactions of brain neurons and tissues. The detailed process of utilizing GCNs in the neuro-imaging analysis is illustrated in Fig. 3 . To analyze the complex brain region connectivity and interaction, a brain graph representation can intuitively portray human brain organization, neurological disorders, and associated clinical diagnosis. Conventionally, the human brain could be modeled Multi-modality MRIs are firstly converted into graph structure which is determined by the region of interest in terms of real human brain signals or fiber connectivity (e.g., node and edge definitions). Through graph-level model development and inference, we highlight numerous image-based analysis and diagnosis of diseases in neurology. into a brain biological network containing nodes (e.g., region of interests) and edges among brain network nodes. The edges could be determined by brain signals or the real fiber connection. Yet these biologically-defined networks are often unable to faithfully capture neurological disorders and outcomes of patients (Korolev et al.) . To overcome this challenge, it is encouraged to leverage informative image-based features to considerably enrich graph node attributes. Comprehensive graph representation can integrate multiple types of information (e.g., image features, human brain signals, and clinical data) to greatly expand the knowledge base of brain dynamics and potentially provide auxiliary clinical diagnosis assistance. The use of GCNs here can be helpful to augment the architecture of human brain networks and has achieved remarkable progress in explaining the functional abnormality from the network mechanism (Sporns, 2014) . In particular, GCNs are able to consider the functional or structural relations among brain regions together with image-based features that are beyond the scope of the conventional CNN-based methods (Shen et al., 2017; Wang et al., 2017) . The CNN-based model is merely viewed as a feature extractor for disease representation without consideration of structure information of the brain. For example, the deep 3-D convolutional neural network architecture was not unable to capture underlying structure information for Alzheimer's disease classification using brain MRI scans (Korolev et al.) . By contrast, the convergence of GCNs methods and MRI provide an alternative means to characterize the architecture of human brain networks and has achieved outstanding progress in brain abnormality explanation (Sporns, 2014) . The graph representations can be divided into functional and structural brain connectivity graphs based on the definitions of the graph components. First, graph nodes are regions of interest (ROI) as defined in MRI. ROI definition is commonly done through the anatomical parcellation of the Montreal neurological institute (MNI) using sMRI and fMRI data (Tzourio-Mazoyer et al., 2002; Zalesky et al., 2010; Liu et al., 2020) . Second, graph edges are determined by the physical connectivity (e.g., the fiber tracts) of nodes in structural brain networks while calculated from the signal series analysis in functional brain networks. We therefore discuss insights of functional and structural brain connectivity graph developments below. The human brain functional connectivity denotes the functional relations between specific human brain areas and functional brain graphs can represent estimates of interactions among time series of neuronal activity (Sporns, 2014) . In functional brain networks, the nodes are defined as brain parcellation ROIs while the node attributes could be hand-crafted features or correlation measurements between nodes. The edges are created through the node correlations between different regions. For example, the GCNs framework achieved high-level performance in classifying Autism spectrum disorders (ASD) and healthy controls (HC) using task-functional magnetic resonance imaging (task-fMRI) through the appropriate ROI definition (Li et al., e) . Their model consists of a message-passing neural network (MPNN) (Gilmer et al.) as convolutional layers that is invariant to graph symmetries (Li et al., e) . Furthermore, Top-k poolings (Cangea et al., 2018) is able to downsample the node to achieve a higher computation efficiency while preserving a meaningful graph delineation. For a similar ASD and HC classification, another study (Li et al., f) used GAT (Veličković et al., 2017) as the convolutional layer incorporated with Top-k (Cangea et al., 2018; Gao and Ji) and SAGE (Lee et al.) pooling for achieving node importance scores. They also introduced two distance losses to enhance the distinguish among the nodes. Also, a group-level consistency loss is added to force the node importance scores to be similar for different input instances in the first pooling layer. Inspired by metric learning, a siamese graph convolutional neural network (s-GCN) is proposed for the ASD and HC classification purpose (Ktena et al., 2018) , where samples were collected from Autism Brain Imaging Data Exchange (ABIDE) (Di Martino et al., 2014) database and UK Biobank (Sudlow et al., 2015) . The graph metric learning method essentially utilized GCN layer (Kipf and Welling, 2016) in a siamese network (Koch et al.) . Two types of graph construction methods are designed as the input of the model, such as spatial and functional graphs which determine nodes by ROIs, meanwhile, the KNN algorithm is utilized for both spatial and functional graph construction. The inputs for s-GCN are two same structure spatial or functional graphs with different signals (e.g., rows or columns of functional matrix). Notably, the spatio-temporal graph could be used to analyze the functional dependency between different brain regions and the information in the temporal dynamics of brain activity simultaneously. The spatio-temporal graph convolutional network is utilized to analyze the Blood-oxygen-Level-dependent (BOLD) signal of resting-state fMRI (rs-fMRI) for human age and sex prediction (Gadgil et al.) . Also, studies analyze the BOLD signal of fMRI for accurate detection of cognitive state changes of the human brain by presenting a dynamic graph learning approach to generate an ensemble of subject-specific dynamic graph embeddings. The brain networks are able to disentangle cognitive events more accurately than using the raw BOLD signals. The functional graph is meaningful to reflect the average functional connection strength between pairs of brain regions within a population. Generally, Pearson's correlation is a useful strategy to construct functional connectivity matrix and define the node attributes. The human brain's structural connectivity in vivo can be captured by structural and diffusion MRI (Achard et al., 2006; van der Horn et al., 2017) , and structural brain graphs could represent anatomical wiring diagrams (Sporns, 2014) . Similar to the definition of nodes in functional connectivity networks, the nodes in structural connectivity networks are defined as a region of interests (ROIs), which are parceled from the brain based on structural and diffusion MRI.. Clinically, the structural brain connectivity represents the structural associations of altered neuronal elements, including both the morphometric alternation and accurate anatomical connectivity as seen in imaging. In the complex brain networks, structural brain connectivity assesses to white amount projections bond cortical and subcortical regions (Kabbara et al., 2016) . The edges indicate the actual neural fiber connections between different brain regions. For example, a stack architecture design of combining a heterogeneous GCN model (Zhang et al., c) with an efficient adaptive pooling scheme (Ying et al., 2018) is able to predict the clinical score of Parkinson's disease (PD) and HC using diffusion-weighted MRI (DWI) on Parkinson Progression Marker Initiative (PPMI) (Marek et al., 2011) . To construct the graph structure from DWI, nodes are defined by the ROIs in the brain while three whole-brain probabilistic tractography algorithms are able to determine different brain structural. The node attributes corresponding to rows in the human brain network are defined as features. Another novel framework is developed to explore graph structure in the q-space by representing DMRI data and utilizing graph convolutional neural networks to estimate tissue microstructure (Chen et al., b) . This approach is capable of not only reducing the data acquisition time but also accelerating the estimation procedure of tissue microstructure. The nodes of the weighted graphs are sets of points on a manifold. Also, the adjacency weights are defined between two nodes using Gaussian kernels, accounting for differences in gradient directions and diffusion weightings. The q-space signal measurements are represented by using the constructed graph that encodes the geometric structure of q-space sampling points. A residual ChebNet (Defferrard et al., 2016) can learn the mapping between sparsely sampled q-space data and high-quality estimates of microstructure indices. Beyond single-modality MRI analysis, multi-modality image data analysis emerged as active research areas for GCNs modeling. Multi-modality MRI data analysis is able to deepen our understanding of disease diagnosis from different data aspects. In neuroimaging, the structural connectivity in sMRI reflects the anatomical pathways of white matter tracts connecting different regions, whereas the functional connectivity in fMRI encodes the correlation between the activity of brain regions. A unique advantage of multi-modality MRI data analysis is that they have incorporated complementary information from different modalities simultaneously. The multimodal data fusion can be implemented by two types of strategy: (1) constructing the original graphs directly using the partial information from functional and structural brain connectivity; (2) constructing the original functional and structural graphs separately and updating the graph representations by computing and fusing the two-side information. For the first fusion strategy, the study (Yang et al.) introduced an edge-weighted graph attention network (EGAT) (Veličković et al., 2017) with a diffPooling (Ying et al., 2018) to classify Bipolar disorder (BP) and HC from sMRI and fMRI in cerebral cortex analysis. Also, the framework of Siamese community-preserving graph convolutional network (SCP-GCN) (Liu et al., b) is able to learn the structural and functional joint embedding of brain networks on two public datasets (i.e., Bipolar and HIV dataset (Liu et al., b) ) for brain disease classification. Especially, siamese architecture can exploit pairwise similarity learning of brain networks to guide the learning process to alleviate the data scarcity problem (Liu et al., b) . Ninety cerebral regions are selected as nodes for both structural (e.g., Diffusion Tensor Image (DTI)) and functional (e.g., fMRI) networks, and the node attribute is determined by the functional connectivity between nodes corresponding to fMRI. The edge connectivity is determined by the DTI via a series of preprocessing (distortion correction, noise filtering, repetitive sampling from the distributions of principal diffusion directions for each voxel). To preserve the community property of brain networks, the design of a community loss presents its usefulness to minimize the intracommunity loss and maximize the intercommunity loss. For the second fusion strategy, the study (Zhang et al., b) fused information from multimodal brain networks on rs-fMRI and dMRI for age prediction. After constructing the original functional and structural brain networks separately, the study reconstructed the positive and negative connections in the functional networks depending on structural networks' information. They utilized a multi-stage graph convolutional layer, motivated by GAT (Veličković et al., 2017) and ResGCN (Li et al., a) , for structural network edge feature update and class classification. Then, the edge feature and class are utilized to update the functional networks for age prediction. Also, for liver lesion segmentation, a mutual information-based graph co-attention module (Mo et al., 2021) is proposed by extracting modality-specific features from T1-weighted images (T1WI) and establishing the regional correspondence from T2-weighted images (T2WI) simultaneously. In this study, they constructed two separate graphs for either T1WI or T2W2; meanwhile, they used GCN (Kipf and Welling, 2016) to propagate representations of all nodes. The mutual information-based graph co-attention module updates the T1WI-based node representation by selectively accumulating information from node features from T2WI-based node representation. The fused node representation is re-projected and added to the original feature for the final segmentation. Furthermore, GCNs extend to allow the multi-modality integration between MRI and non-imaging data for analyzing complex disease patterns. For example, an Edge-Variational GCN (EV-GCN) (Huang and Chung) could automatically integrate imaging data (e.g. fMRI data) with non-imaging data (e.g. age, gender and diagnostic words) in populations for uncertainty-aware disease prediction. They constructed weighted graphs via an edge-variational population graph modeling strategy. In the weighted graphs, the graph nodes are ROIs and the node attributes are features extracted from histology and fMRI images. It is particularly notable that the weight of the edge is achieved by a learnable function of their non-imaging measurements. The proposed Monte-Carlo edge dropout randomly drops a fraction of edges in the constructed graphs to reduce overfitting and increase the graph sparsity. In addition, two similar studies (Yu et al.; Song et al.) constructed the sparse graph by combining the information from the functional connectivity (e.g., rs-fMRI), structural connectivity (e.g., DTI), and demographic records (e.g., gender and age) for mild cognitive impairment detection and classification. In these studies, they constructed functional and structural brain networks for each subject (e.g., image). Then, they defined each subject as graph node. For the first study (Yu et al.) , they concatenated the feature from functional and structural connectivity as vertices features. Also, they calculate the feature and phenotypic information similarity to constructed graph edges. Further, they utilized GCN (Kipf and Welling, 2016) incorporated with the random walk algorithm to enhance the detection performance. For the second study (Song et al.) , it constructed graphs for the functional and structural connectivity separately. Beyond the similarity evaluation of subjects (e.g., vertices) feature and non-image phenotypic information, this study also determined the edge by connecting nodes belong to the same receptive field class directly. Further, they used the constructed graph to pretrain GCN to update graphs and GCN for the final disease deterioration prediction. Extensive studies have also utilized GCNs in X-ray and Computed Tomography (CT) images for disease analysis (Burwinkel et al.; Zhai et al.; Saha et al., 2021) . Different from MRI data, CT images are able to reflect the vessel skeleton information that could assist a variety of clinical tasks. For example, chest CT scans can assist with arteries-veins separations that are of great clinical relevance for chest abnormality detection (Zhai et al.) . The graph was constructed of the voxels on the skeletons resulting in a vertex set and their connections in an adjacency matrix. The skeletons are extracted from chest CT scans by vessel segmentation and skeletonization. In this study (Zhai et al.) , GCN layers can extract and learn connectivity information. The one-degree (direct) neighbors were considered and the vertices attributes were extracted by CNN model to consider the local image information. In addition, chest CT scans together with GCN is able to assist airway semantic segmentation, which refers to the segmentation of airway from background and dividing it into anatomical segments for lung lobe analysis (Tan et al.) . Also, a prototype-based GCN framework (Zhao and Yin) provided a means for airway anomaly detection to aid in lung disease diagnosis. The GCN layers calculated the initial anomaly score for every node, while the prototype-based detection algorithm computed the entire graph's anomaly score. Another study (Chao et al.) utilized radiotherapy CT (RTCT) and PET, which is registered to the RTCT for lymph node gross tumor volume (GTV LN) detection. In this study, the 3D CNN extracted instance-wise visual features while the GNN model analyzed the inter-LN relationship. The feature fusion of CNN and GNN boosted the GTV LN detection performance. Furthermore, the study (Lang et al.) utilized GCN (Kipf and Welling, 2016) on cone-beam computed tomography (CBCT) images for craniomaxillofacial (CMF) landmark localization, which is important for designing treatment plans of reconstructive surgery. They utilized an attention feature extraction network for localizing landmarks and generating attention features for the graph construction. In addition, the study (Burwinkel et al.) proposed an end-to-end hybrid network to train a CNN and GAT network to leverage both advanced feature learning and inter-class feature representations on Chest-Xray 14 dataset (Rajpurkar et al., 2017) . To utilize the image sequencing information, they determine each image from the same patient as a vertice of a graph and the extracted features are the attributes of vertices. Furthermore, they leverage non-imaging meta-data, such as clinical information, to construct edges between the vertices. After constructing the graph and updating the graph representation with GAT, they combine the CNN extracted features with graph representation by skip-connectivity to achieve hybrid representation. The motivation of generating hybrid representation is to improve the distinction between samples. Furthermore, CT images and nonimage clinical information could be analyzed jointly for Lymph node metastasis (LNM) prediction. The study (Cui et al.) proposed a co-graph convolutional layer consisting of Con-GAT (Veličković et al., 2017) and Corr-GAT layers to achieve the node's new representation by weighted averaging its neighboring nodes and measuring the weight score by feature differencebased correlation. Due to the pandemic of COVID-19, GCNs have also been utilized in disease detection. GraphCovidNet (Saha et al., 2021) utilized GIN for COVID-19 detection on both CT and X-ray images. The graph is used to depict the outline of an object (e.g., organ) in the image. First, they applied edge detection to determine the edge outline. Then, the graph nodes are defined by the pixel having a grayscale intensity value greater than or equal to 128, which implies nodes reside only on the prominent edges of the edge image. The node attribute consists of the grayscale intensity of the corresponding pixel. An edge exists between the two nodes which represent neighboring pixels in the original image. For example, the GCN-based model (Liu et al., a) extractes node information hierarchically towards both diagnosis and prognosis for COVID-19 patients. Their distance-aware pooling, including graph-based clustering and feature pooling, is able to aggregate node information on the dense graph effectively. Also, the proposed model could coarsely localize the most informative slices for CT scans to provide the interpretability for better clinical decisionmaking. Table. 1 summarizes a variety of graph construction methods and GCNs application in radiologic image analysis. Compared to conventional methods, GCNs methods for the analysis of brain networks have the possibility of combining image-based features with the conventional brain networks. Nodes are ROIs, and node attributes are seven anatomical features and four functional connectivity statistic features. The graph is densely connected graph, and edge attributes are calculated by the Pearson correlation-induced similarity.. Liu et al. (b) multi modality fMRI and DTI Nodes are ROIs, and node attributes are rows/column of connectivity matrix. The edges are determined by region-to-region correlations. Zhang et al. (b) multi modality rs-fMRI and dMRI Nodes are ROIs. The edges are determined by edge-set. For functional brain network, the edge attributes are the correlation of fMRI signals between nodes. For structural brain network, the edge attributes are the probability of fiber tractography between nodes. Mo et al. Mo et al. (2021) multi modality T1WI and T2WI Nodes are a group of features in the partial region of the original regular grid coordinates, and node attributes are extracted by CNN model. The graph is a fully-connected graph. Continue on the next page The graph is fully connected graph. CT and X-ray images Nodes are pixels, and node attributes are grayscale intensity of the pixel. The edges are determined by the neighborhood relationship between pixels. The growth of digitalized histopathological images presents a valuable resource to support rapid and accurate clinical decision making. The high-resolution whole slide image (WSI) contains rich tissue characteristics including patterns of cell nuclei, glands, and lymphocytes (De Matos et al., 2019; Gurcan et al., 2009) . Extensive pathological characteristics of tissue and cell interactions can be evidently observed that are not available in other clinical image data. For instance, lymphocytic infiltration of cancer status can be deduced only from histopathology imagery (Zhou et al., 2020b) . These pathological patterns can be used to build the biological graph networks that can inform disease status and thus discern predictive imaging biomarkers. Overall, we recognize that GCNs analysis is uniquely positioned to address key issues of histopathological applications, including data annotation, tissue connections, global-local information diagnostic fusion, and model prediction performance in challenging settings. multi-scale context representation. First, graph structure provides a reasonable choice to represent the entire slide in terms of tissue content connectivity. Such entire-slide graph representation can avoid fine-grained patch-wise label annotation. Since we know that patch-level labeling is highly time-intensive, even impossible, to include all ranges of tumor patches annotated by human experts. Second, graph structural representation can capture multi-scale contexts considering both global and local image-wise features towards enhanced prediction of disease outcomes. Third, graph structural representation builds upon the interaction among spatially-separated tiles that enables a more flexible and comprehensive receptive field. Such advances are analogous to the workflow of human experts that we consider tumor environment, tissue contents, and their interactions, rather than single tumor tiles, to diagnose tissue status of patients. Because the high-resolution histopathological image does not present a natural form of graph structure, efficient graph representation becomes a vital factor for model development and optimization. Current graph construction in histopathology can be broadly categorized into patch-based and cell-based methods. First, patch-based graph construction methods aim to enable information extraction by considering the entire micro-environment (e.g., the cells and tissues), where comprehensive tissue micro-environment and cell dynamics can be captured. In these patch-based methods, graph nodes are defined as the selected patches determined by ROIs in the histopathological image. The associated node attributes can be extracted by standard feature extractors (e.g., ResNet18 or VGG16). Graph edges are defined as the connectivity between nodes, which is determined by the feature or coordinate distance between two nodes. A smaller distance means a higher probability of connectivity. The connectivity between nodes could determine an adjacency matrix to represent the entire topological structure of the graph. Although the definition of primitive graph components (e.g., node and edge) are conceptually similar, most patch-based graph construction methods have different settings for node attributes and edge construction. As opposed to the patch-based graphs, cell-based graph methods emphasize the possible biological significance derived from histopathology. Cell-based graph construction methods aim to model the relationship between different cells and the micro-environment (e.g., tissues or vessels) utilizing graph-based features (Zhou et al.) . In a cell graph, the detected and segmented nuclei or cell clusters are considered as nodes. The node attribute is defined as the combination of imagewised features, such as features extracted by CNN models, and the hand-crafted feature, such as the number or the size of nuclei, the average RGB value of nucleus, gray level co-occurrence matrix features, VGG19 features, and the number of neighbors of a nucleus (Anand et al.) . According to the assumption that adjacent cells are more likely to interact (Zhou et al.) , the edge between the nodes can be determined via Delaunay triangulation (Keenan et al., 2000) or the K-nearest-neighbour method (Bilgin et al.) , which could evaluate whether two cells (nodes) belong to the same cluster. The cells in the same cluster are more likely to have an edge between them. Despite a good performance on clinical classification tasks, these approaches cannot work well in capturing the diagnostic and prognostic information from the surrounding micro-environments (e.g., tissues and vessels). Meanwhile, constructing cell-centered graphs highly depends on cell detection accuracy. It is notable that constructing a cell-based graph and subsequent graph computing need an excessive computational complexity. The process of utilizing GCNs in histopathological image analysis is shown in Fig. 4 . We outline several areas of clinical interest for GCNs in histopathology below. Accurate tumor segmentation in histopathology is designed to assist pathologists for improving workflow efficiency of clinical diagnosis (Anklin et al., 2021) . Graph-based segmentation approaches can incorporate both local and global inter-tissue-region relations to build contextualized segmentation and thus improve the overall performance. For example, SEGGINI performs semantic segmentation of images by constructing tissue-graph representation and performing weakly-supervised segmentation via node classification by using weak multiplex annotations, i.e., inexact and incomplete annotations, in prostate cancer (Anklin et al., 2021) . In this study, they defined graph nodes by superpixels merging based on channel-wise color similarity of superpixels at higher magnification. The node attribute is determined by the spatial and morphological features of the merged node (e.g., the merged superpixel). The spatial feature is computed by normalizing superpixel centroids by the image size and the morphological feature is extracted by a pre-trained MobileNetV2 (Sandler et al.) . They defined the edges by constructing a region adjacency graph (RAG) (Potjer, 1996) from the spatial connectivity of superpixels. The local and global connection of tissue details creates an alternative avenue for pixel-level segmentation evaluation that draws a contrast to other conventional convolutional-based tumor segmentation approaches (Xue et al.; Wang et al., 2020a; Qu et al., 2020) . Another study proposed an end-to-end framework that utilizes an unsupervised pretrained CNN to extract tile features and generate dynamics superpixels for graph construction, while using GCN for predicting the final segmentation map. In this study, the dynamics superpixels can be viewed as a key bridge between CNN and the GCN model, which are generated according to the CNN feature extraction. Cancer subtype classification is crucial in clinical image analysis that can impact patient stratification, outcome assessment, and treatment development (Adnan et al.; Zhao et al.) . GCNs have been extensively studied in cancer subtype classification due to their unique ability to explore the relational features among tissue sub-regions (e.g., patches or cells). Patch-based graph construction approaches are intuitive to build a bridge between image features and graph structure. Conceptually, patches are defined as nodes and node attributes are extracted patch features, including CNN-based extracted and hand-crafted features. The edges are typically determined by the Euclidean distance of nodes. For example, the combination of ChebNet (Defferrard et al., 2016) and Graph-Sage (Hamilton et al.) presents its usefulness for classifying lung cancer subtypes in histopathological images (Adnan et al.) via patch selection. All patches in the tissue region are grouped into multiple classes, and a portion of all clustered patches (e.g., 10%) are randomly selected within each class. Also, a simplified graph construction process (Adnan et al.) can be useful to leverage all patch information. The global context among patches is considered while using a fully connected graph to represent the connection among nodes. Global pooling layers (e.g., global attention, max, and sum poolings) are able to generate graph representations for analyzing cancer classification. In particular, global attention pooling (Li et al., 2015) provides strong interpretability to determine which nodes are relevant to the current graph-level classification tasks. In colorectal cancer histopathology, ChebNet (Defferrard et al., 2016) shows its predictive power in lymph node metastasis (LNM) prediction (Zhao et al.) . Interestingly, a combination model of a variational autoencoder and generative adversarial network (VAE-GAN) (Larsen et al.) is utilized to train as a feature extractor to decode the latent representations closer to their original data space. Further, the pixel-based graph construction could be understood as a variant of patch-based approaches. The study (Gao et al., b) developed a group quadratic graph convolutional network for breast tissue and grade classification on pixel-based graph representation. The proposed model reduces the redundant node (e.g., pixel) feature, selects superior fusion feature, and enhances the representation ability of the graph convolutional unit by the pixel-based graph analysis. As opposed to patch-based approaches, cell-based graph construction is under a key assumption that cell-cell interactions are the most salient points of information (Levy et al.) . A common example is to define the detected nuclei as nodes (Anand et al.) and while the overall node attributes are aggregated by concatenating multiple types of features (see Table2) . The graph edge is determined by thresholding the Euclidean distance between nodes. In addition, the cell graph convolutional network (Zhou et al.) presents a generalized framework for grading colorectal cancer histopathological images based on the combination of GraphSage (Hamilton et al.) , JK-Net (Xu et al.) , and Diffpooling (Ying et al., 2018) . The edge between two nuclei is determined by a fixed distance while the maximum degree of each node is set to k corresponding to its k-nearest neighbors. Sharing a similar cell-graph construction strategy and graph component definition with (Zhou et al.) , a GIN-based (Xu et al., 2018) framework is designed for breast cancer subtype classification . In addition, the clinical interpretation is provided by a cellgraph explainer that is inspired by a previous graph explainer (Ying et al., 2019) , a post-hoc interpretability method based on graph pruning optimization. The cell-graph explainer is able to prune the redundant graph components, such as the nodes that could not provide enough information in the decision making, and define the resulting subgraph as the explanation. Another cell graph application of cancer classification (Sureka et al.) is built on top of robust spatial filtering (RSF) , where RSF combined with attention mechanisms to rank the graph vertices in their relative order of importance, providing visualizable results on breast cancer and prostate cancer classification. To leverage the advantages of patch-and cell-based graphs simultaneously, the model integration can provide additional auxiliary benefits by capturing detailed nuclei and micro-environment tissue information. A hierarchical cell-to-tissue graph neural network (HACT-Net) is an example to consist of a low-level cell-based graph (e.g., cell-graph), a high-level patch-based graph (e.g., tissue-graph), and a hierarchical-cell-to-tissue representation for breast carcinoma subtype classification. For the cell-based graph, they defined nuclei as graph nodes that are detected by the pre-trained Hover-Net (Chilamkurthy et al., 2018; Kumar et al., 2017; Graham et al., 2019) . For the patch-based graph, they determined graph nodes and their attributes by creating non-overlapping homogeneous superpixels and their features. The edges are constructed by a region adjacency graph (Potjer, 1996) using the spatial centroids of the super-pixels. Overall, such a joint analysis across histopathological scales leads to enhanced performance for cancer subtype classification. Cancer staging and grade classification is also of clinical significance that comprises tumor tissue and nodal (e.g., tumor and lymph nodes) staging (Levy et al.) . Patch-based graph construction strategy is commonly used in tumor staging classification in terms of graph attention (Raju et al.) . Also, graph topological feature extraction is useful in colon cancer tumor stage prediction with well interpretation (Levy et al.) . In particular, they utilized the Mapper (Bodnar et al., 2020) to project high-dimensional graph representation to a lower-dimensional space, summarizing higher-order architectural relationships between patch-level histological information to provide more favorable interpretations for histopathologists. Further, for liver fibrosis stage classification, the study (Wojciechowska et al.) proposed a patch-based graph structure together with the GCN attention layer to analyze the spatial organization of the fibrosis patterns. They use the KNN algorithm to cluster the tiles to select regions of high collagen content as the centroid node for graph construction. The proposed pipeline allows for the separation of fibers in the slide into localized fibrosis patterns and the individual regions can be inspected by a pathologist (Wojciechowska et al.) . Also, cell-based graph construction strategy is useful for cancer grade classification. For example, for prostate cancer grade classification (Wang et al., a) in tissue micro-array, GraphSage (Hamilton et al.) learns the global distribution of cell nuclei, cell morphometry, and spatial features without requiring pixel-level annotation. In this study, the cell nuclei are the node of the graph while the three types of features consist of the node attribute, including the morphological feature (e.g., the area, roundness, eccentricity, convexity, orientation for each of the nucleus.), texture feature (e.g., the dissimilarity, homogeneity, energy, and ASM based on the obtained grey level co-occurrence matrix), and contrastive predictive coding features. Survival analysis is a long-standing clinical task to determine the prognostic likelihood of patients Wang et al., 2021) . Both cell-and patch-based approaches can be considered to capture survival sensitive information of patients. For instance, the graph convolutional neural network with attention learning has shown to achieve a good performance on the survival prediction in colorectal cancer (Li et al., d) . Tumor tiles are defined as nodes and node attributes are extracted by the VGG16. Graph edges are constructed by thresholding the Euclidean distances between node attributes. After constructing the graph, they used the ChebNet (Defferrard et al., 2016) framework for survival analysis on the histopathological images. With a similar definition of graph components, another study (Chen et al., d) designed a patch-based graph construction strategy in the Euclidean space. They utilized the DeepGCN (Li et al., a) and global attention layer to boost the survival prediction performance and provided the interpretability across five cancer types. An integrated framework extracted morphological features from histology images using CNNs and from the constructed cell-based graph using GraphSage (Hamilton et al.) , and also genomic (mutations, CNV, RNA-Seq) features using SNNs. A fusion of these deep features using the Kronecker Product is of great interest for accurate survival outcome prediction. In addition, cell-based and patch-based graphs can be further unified to allow a trade-off between efficiency and granularity (Wang et al., c) . They used GAT or prostate cancer survival prediction using WSIs. Notably, a self-supervised learning method is proposed to pretrain the model, yielding improved performance over trained-from-scratch counterparts. For cell-based graphs, they use a Mask R-CNN (He et al.) for nuclei segmentation and define an eight-pixel width of the ring-like neighborhood region around each nucleus as its cytoplasm area. The nuclear morphometry features and visual texture features (intensity, gradient, and Haralick features) have made substantial contributions for both nuclear and cytoplasm region representations respectively. Despite these advances, uncertainty remains for exploring definitive roles of cell-level and patch-level characteristics with regard to overall survival likelihood of patients. Image-based molecular biomarker prediction is promising to deepen our understanding of cancer biology across data modalities. Enormous efforts are gaining momentum to explore multiple image-to-genome associations in cancer research (Kather et al., 2019 (Kather et al., , 2020 Fu et al., 2020) . The feature-enhanced graph network (FENet) (Ding et al.) leverages histopathological-based graph structure to predict key molecular outcomes in colon cancer. Through the spatial measurement of tumor patches, the image-to-graph transformation illustrates its unique value in predicting key genetic mutations. In particular, the use of GIN (Xu et al., 2018) layer and jumping knowledge structure are useful to aggregate and update the patch embedding information. Alternatively, the cell-based construction method is considerable for cancer biomarker prediction (Lu et al.) . HoverNet (Graham et al., 2019 ) is a popular choice for nuclei segmentation to support cell graph construction. Next, the agglomerative clustering (Müllner, 2011) is utilized to group spatially neighboring nuclei into clusters. These clusters can be defined as graph nodes and the node attribute is determined by the standard deviation of nuclei sizes. Meanwhile the edges are constructed by using Delauney triangulation based on the geometric coordinates of cluster centers with a maximum distance connectivity threshold. Both cell-and patch-based approaches contribute to the integration of histopathology and genome as more biological data become accessible. We recognize that graph-based models can offer an efficient means to measure the cross-modality differences, which requires careful inputs on graph construction, model layer architectures, proper design of feature extraction for achieving improved performance of molecular outcome prediction. Overall, Table. 2 summarizes the category, type of tasks, and the graph-structure construction strategies. In this chapter, we have discussed novel perspectives for computational histopathological image analysis. In particular, GCNs-based methods provide a novel perspective to consider tumor heterogeneity in histopathological image analysis. Despite multiple challenges, the evolving capacity of current graph construction strategies (edge, node, and node attributes) makes it possible to address a variety of clinical tasks using histopathological images. Nodes are the geometric center of nuclei cluster, and node attributes are determined by the count of the six nuclei types and the standard deviation of nuclear sizes. The edges are determined by the Delauney triangulation between cluster center with a maximum distance threshold. GCNs have demonstrated their analytical ability in alternative medical image disciplines to facilitate structural analysis of disease diagnosis (e.g., eye disease and skin lesion), surgery scene understanding, and Bone Age Assessment. For instance, GCNs have been studied in dermatology and eye-related diseases, involving retinal, fundus, and fluorescein angiography (FA) images Shin et al., 2019; Noh et al.; . Similar to radiological and histopathological images, patch-based graph construction strategies are widely used in the above image domains. GCNs have shown to be valuable to learn the vessel shape structures and local appearance for vessel segmentation in retinal images (Shin et al., 2019) . Also, GCNs were applied to the artery and vein classification by using both fundus images and corresponding fluorescein angiography (FA) images (Noh et al.) . With a designed graph U-Nets architecture (Gao and Ji) , the high-level connectivity of vascular structures can be learned from node clustering in the node pooling layers. In addition, a study (Meng et al.) proposed a framework that combines the CNN and ResGCN (Li et al., a) model to enhance the segmentation performance of fetal head on ultrasound images. Furthermore, GCNs show their power in differential diagnosis of skin conditions using clinical images (Wu et al.) . This problem is formulated as a multi-label classification task over 80 conditions when only incomplete image labels are available. The label incompleteness is addressed by combining a classification network with a graph convolutional network that characterizes label co-occurrence (Wu et al.) . Each clinical image is defined as a graph node and the connectivity between two nodes is determined by domain knowledge of skin condition by board-certified dermatologists. It is noteworthy that edge connection is made by inputs from human experts that two dermatologists provide overlapped differential diagnoses groups and connect an edge when two labels appear in at least one differential group by both dermatologists. In addition, a cell-based graph analysis combines multiple types of GCNs with graph poolings, including GIN (Hamilton et al.) , GraphSage (Xu et al., 2018) , and GCN (Kipf and Welling, 2016) for survival prediction of gastric cancer using immunohistochemistry (mIHC) images. The graph nodes are determined by six antibodies of PanCK, CD8, CD68, CD163, Foxp3, and PD-L1, which were used as annotation indicators for six different types of cells. The node attributes are determined by cell locations, types, and morphological features. The edges are constructed by the maximum effective distance between immune and tumor cells, which is equivalent to 40 pixels in the magnification of this study. Furthermore, a surgery scene graph analysis (Islam et al.) utilized GraphSage to predict surgical interactions between instruments and surgical regions of interest. In addition, GCNs provide the potential for automatic bone age assessment and ROI score prediction on hand radiograph (Chen et al., c) . This study proposed an anatomy-based group convolution block to predict the ROI scores by processing the local features of ROIs. Also, they presented a dual graph-based attention module to compute the patient-specific attention and context attention for ROI score prediction. Nodes are vessel pixels with in N × N local patches, and node attributes are extracted by graph U-nets. The edges are constructed with existing vessel pixels within an N × N local patch. Optic disc and cup segmentation and fetal head segmentation Nodes are the object boundaries that are divided into N vertices with the same interval, and node attributes are extracted by CNN. The edges are determined by every two consecutive vertices on the boundary and the center vertex are connected to form a triangle. Skin clinical images Nodes are images, and node attributes are extracted by CNN. The edges are connected when two labels appear in at least one differential group by both dermatologists, and edge attributes is calculated by the follow equation: C(i, j) is the number of images have two label at same time. C(i) or C( j) is the number of images in class i or j. The rapid growth of GCNs (Ding et al.; Zhao et al.; Li et al., d) and their extensions have been increasingly utilized for processing, integrating, and analyzing multi-modality medical imaging and other types of biological data. We here discuss several future research directions and common challenges to advance the research in medical image analysis and related research fields. We particularly outline key aspects of importance, including GCN model interpretation, the value of pre-training model, evaluation pipeline, large-scale benchmark, and emerging technical insights. The interpretation of GCNs is of heightened interest to make the outcome understandable, ensure model validity, and enhance clinical relevance. In our focus, a well-designed interpretation framework of GCNs is expected to provide the explanation and visualization for both image-wise and graph components understanding. Such an interpretative ability can be highly attractive to clinicians in the process of diagnosing regions of interest in histopathology, enabling an understanding of spatial and regional interactions from graph structures (Ding et al.) . As demonstrated, three metrics are useful to design and understand the interpretation capability of GCNs : (1) Fidelity refers to the importance of classification as measured by the impact of node attributes, (2) Contrastively points to the significance with respect to different classes, and (3) Sparsity reflects the sparseness level on a graph. These metrics can help generate and measure the valuable heat maps of graph nodes given their attributes. Representative approaches include gradient class activation mapping (Grad-CAM), contrastive excitation backpropagation (c-EB), and contrastive gradient (CG) (Pope et al.) . Further, we recognize that emerging studies have explored the specified interpretation strategy for GCNs Ying et al., 2019) . For instance, an ROI-selection pooling layer (R-pool) highlights the node importance for predicting neurological disorders by removing noisy nodes to realize a dimension reduction of the entire graph. Rather than node-level feature interpretation, additional efforts will be greatly needed on interpreting the relational information in graphs. GnnExplainer (Ying et al., 2019) is an example to leverage the recursive neighborhood-aggregation scheme to identify graph pathways as well as node feature information passing along the edges. The design of GnnExplainer is appealing to visualize the detailed cell-graph structure and provide class-specific interpretation for breast cancer . As a result, we strongly emphasize that the interpretation process considers an in-depth joint understanding of the clinical task, graph model architecture, and model performance. Pretraining GCNs aims to train a model on the tasks with a sufficient amount of data and labels and finetune the model into downstream tasks. Pre-trained GCNs can serve as a foundation model to improve the generalization power when the size of the training set is often limited in medical imaging (Hu et al., 2019) . The pretraining workflow of GCNs typically includes the model training rules, hyper-parameter settings, and constructed-graph augmentation strategies. A key pretraining scheme for a graph-level task is to reconstruct the vertex adjacency information (e.g. GraphSAGE) without hurting intrinsic structural information (You et al., 2020) . We offer several compelling directions of pretraining strategy to improve GCNs model robustness and their utility in different tasks. First, the graph-wise augmentation strategies have a large room to facilitate the pretraining of graphs. For instance, the out-of-distribution samples can be analyzed via node-level and graph-level augmentations (Hu et al., 2019; You et al., 2020) . Second, exploring label-efficient models (e.g., unsupervised or self-supervised learning) in conjunction with pretraining strategies (You et al., 2020) could greatly alleviate the labeling shortage. Notable studies (Hu et al., 2019; You et al., 2020; Qiu et al.) have achieved good performance in downstream tasks while leveraging the graph-based pretraining strategies. Considering the above directions, a self-supervised learning framework for GCNs pretraining (You et al., 2020) demonstrates that graph-wise augmentation strategies are useful to address the graph data heterogeneity. The pretraining is performed through maximizing the agreement between two augmented views of the same graph via performing node dropping, edge perturbation, attribute masking, and subgraph selection. Notably, only a small partition of graph components will be changed, meanwhile the semantic meaning of the graph has been preserved. Such a strategy brings graph data diversity that is greatly needed for building robust pre-trained GCN models. Taken together, the research on pretraining GCNs and their practical impact is only to start and will continue to make progress on downstream image-related clinical tasks. The evaluation of graph construction in medical imaging is vital because the associated graph construction could significantly affect the model performance and the interpretation of outcomes. The general graph construction methods used ROIs (e.g., image patches or brain neurons) as graph nodes and the node attributes are obtained by standard feature extractors (e.g., ResNet18). In addition, edges represent the connections between nodes which could be determined by the Euclidean distance between node features, or the connections between ROIs which are determined by patch coordinates or the actual neural fiber connections. Currently, graph construction strategies are applied in different tasks and a generalized graph construction evaluation strategy is not explicitly developed yet. It is even more difficult to determine which kind of graph construction is generalizable for task-specific medical image analysis because of various datasets and graph construction metrics. Also, developing a generalized graph construction evaluation strategy is necessary for GCNs to better process medical image data across multiple modalities because the model performance is highly related to the quality of constructed graph-structured data. The benchmarking framework Errica et al., 2019) has rigorously evaluated the performance of graph neural networks on medium-scale datasets and demonstrates its usefulness for analyzing message-passing capability in GCNs. Also, a comparison strategy among multiple GCNs (Kather et al., 2020) can address the issues of reproducibility and replicability. Following the graph evaluation (Kather et al., 2019 (Kather et al., , 2020 , we need to define statistical distinctions to ensure the performance of GCNs. For example, it is helpful for model training and human understanding if the graph structure and feature distribution differences between positive and negative patient samples are significantly different. Despite the remarkable effort on standardization of medical imaging cohorts, the high-quality, large-scale graph-defined benchmark has not been readily available for AI model evaluation, especially in medical image analysis. Open Graph Benchmark (OGB) exemplifies the initiative that contains a diverse set of real-world benchmark datasets (e.g., protein, drug, and molecular elements) to facilitate scalable and reproducible graph machine learning research (Hu et al., 2020) . The number of graphs and nodes in each graph are both massive in OGB. Even small-scale OGB graphs can have more than 100 thousand nodes or more than 1 million edges. This comprehensive dataset in various domains can be viewed as a baseline to support the GCNs' development and comparison. Related works have been explored on graphs including a chemistry dataset with 2 million graphs and a biology dataset with 395K graphs (Hu et al., 2019) . As seen in OGB development, there are challenges to collecting and processing suitable medical image datasets and constructing meaningful graphs following the image-to-graph transformation. First, we need to collect a large number of medical images across multiple centers to ensure data diversity. It is also essential to provide detailed annotation information for collected datasets on the image-level region of interest. Second, graph-wise statistics is important to allow measurement of graph-level dynamics. Notable graph metrics (Zamanitajeddin et al.) , such as the average node degree, clustering coefficient, closeness centrality, and betweenness centrality, can be used to assess graph characteristics and help determine unique graph structures. For instance, the average node degree calculates the average degree of the neighborhood of each node to delineate the connectivity between nodes to their neighbors. The clustering coefficient measures how many nodes in the graph tend to cluster together. Closeness centrality highlights nodes that can easily access other nodes. Third, we must carefully design image-graph components, such as the definition of graph nodes in different types of graphs that are vital to downstream clinical tasks. Finally, the real impact of pre-trained foundation models on large-scale graph-wise datasets still needs to be explored. While using pretraining GCNs to improve data-efficiency issues in medical image analysis, the models can be well-trained on the large-scale graph-wise dataset and adapt into specific tasks, even with a limited size of downstream data. The rapid development of deep learning is bringing novel perspectives to address the challenges of graph-based image analysis. The transformer architecture (Vaswani et al.) , emphasizing the use of a self-attention mechanism to explore long-range sequential knowledge, emerges to improve the model performance in a variety of natural language processing (NLP) (Devlin et al., 2018) and computer vision tasks Dosovitskiy et al., 2020) . A graph-wise transformer can be effectively considered to capture both local and global contexts, thus holding the promise to overcome the limitation of spatial-temporal graph convolutions. For example, graph convolutional skeleton transformers integrate both dynamical attention and global context, as well as local topology structure in GCNs (Bai et al., 2021) while the spatial transformer attention module discovers the global correlations between the bone-connected and the approximated connected joints of graph topology. In medical image analysis, the combination of GCNs and transformer models can be favored to process 3D MRI sequences to boost the model prediction performance, where GCNs explore the topological features while Transformers could model the temporal relationship among MRI sequences. In the meantime, self-supervised learning strategy is emerging in graph-driven analysis with limited availability of imaging data. Notably, self-supervised learning (SSL) provides a means to pretrain a model with unlabeled data, followed by fine-tuning the model for a downstream task with limited annotations (Chaitanya et al., 2020) . Contrastive learning (CL), as a particular variant of SSL, introduces a contrastive loss to enforce representations to be close for similar pairs and far for dissimilar pairs (Chaitanya et al., 2020; Xie et al.) . Another technique to address the limitation of data labeling is the advent of self-training learning to generate pseudo-label for model retraining and optimization (Liu et al., d) . A self-training method for MRI segmentation has shown the potential solution for cross-scanner and cross-center data analytical tasks (Liu et al., d) . Also, the teacher-student framework is another type of self-training, which trains a good teacher model with labeled data to annotate the unlabeled data, and finally, the labeled data and data with pseudo-labels can jointly train a student model (He et al.) . Overall, both self-supervised learning and self-training strategies can be utilized in GCNs model training to potentially improve the model performance and overcome the annotation and data scale challenges. We have witnessed a growing trend of graph convolutional networks applied to medical image analysis over the past few years. The convergence of GCNs, medical imaging data, and other clinical data, brings advances into outcome interpretation, disease understanding, and novel insights into data-driven model assessment. These breakthroughs, together with data fusion ability, local and global feature inference, and model training efficiency, lead to a wide range of downstream applications across clinical imaging fields. Nevertheless, the development of benchmark graph-based medical datasets is yet to be established. Consistency and validity of graph construction strategy in medical imaging are greatly needed in future research. Recent technological advances can be considered to enhance and optimize GCNs in addressing challenging problems. We hope that the gleaned insights of this review will serve as a guideline for researchers on graph-driven deep learning across medical imaging disciplines and will inspire continued efforts on data-driven biomedical research and healthcare applications. A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs Representation learning of histopathology images using graph neural networks Histographs: graphs in histopathology Learning whole-slide segmentation from inexact and incomplete labels using tissue graphs Advances in neural information processing systems Gcst: Graph convolutional skeleton transformer for action recognition Mincut pooling in graph neural networks Cell-graph mining for breast tissue modeling and classification Deep graph mapper: Seeing graphs through the neural lens Spectral networks and locally connected networks on graphs Adaptive image-feature learning for disease classification using inductive graph networks Towards sparse hierarchical graph classifiers Contrastive learning of global and local features for medical image segmentation with limited annotations Lymph node gross tumor volume detection in oncology imaging via relationship learning using graph neural network Measuring and relieving the over-smoothing problem for graph neural networks from the topological view Estimating tissue microstructure with undersampled diffusion data via graph convolutional neural networks Doctor imitator: A graph-based bone age assessment framework using hand radiographs Whole slide images are 2d point clouds: Context-aware survival prediction using patch-based graph convolutional networks Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis Deep learning algorithms for detection of critical findings in head ct scans: a retrospective study Co-graph attention reasoning based imaging and clinical features integration for lymph node metastasis prediction Histopathologic image processing: A review Convolutional neural networks on graphs with fast localized spectral filtering Signed graph convolutional networks Bert: Pre-training of deep bidirectional transformers for language understanding The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism Edge contraction pooling for graph neural networks Towards graph pooling by edge contraction Feature-enhanced graph networks for genetic mutational prediction using histopathological images in colon cancer An image is worth 16x16 words: Transformers for image recognition at scale A fair comparison of graph neural networks for graph classification Topological tumor graphs: a graph-based spatial model to infer stromal recruitment for immunosuppression in melanoma histology Synthetic data augmentation using gan for improved liver lesion classification Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis Spatio-temporal graph convolution for resting-state fmri analysis Graph u-nets, in: international conference on machine learning Utnet: a hybrid transformer architecture for medical image segmentation Gq-gcn: Group quadratic graph convolutional network for classification of histopathological images Neural message passing for quantum chemistry Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images Radiologist level accuracy using deep learning for hemorrhage detection in ct scans Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining Histopathological image analysis: A review Inductive representation learning on large graphs Brain tumor segmentation with deep neural networks Graph analysis of functional brain networks in patients with mild traumatic brain injury Open graph benchmark: Datasets for machine learning on graphs Strategies for pre-training graph neural networks Edge-variational graph convolutional networks for uncertainty-aware disease prediction Segmentation of organs-at-risks in head and neck ct images using convolutional neural networks Learning and reasoning with the graph structure representation in robotic surgery Towards explainable graph representations in digital pathology Graph analysis of spontaneous brain network using eeg source connectivity Pan-cancer image-based detection of clinically actionable genetic alterations Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer Molecular graph convolutions: moving beyond fingerprints An automated machine vision system for the histological grading of cervical intraepithelial neoplasia (cin) Semi-supervised classification with graph convolutional networks Understanding attention and generalization in graph neural networks Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier Residual and plain convolutional neural networks for 3d brain mri classification Metric learning with spectral graph convolutions on brain connectivity networks A dataset and a technique for generalized nuclear segmentation for computational pathology Optimal deep learning model for classification of lung cancer on ct images Automatic localization of landmarks in craniomaxillofacial cbct images using a local attention-based graph convolution network Autoencoding beyond pixels using a learned similarity metric Self-attention graph pooling Cayleynets: Graph convolutional neural networks with complex rational spectral filters Topological feature extraction and visualization of whole slide images using graph neural networks Deepgcns: Can gcns go as deep as cnns? Deeper insights into graph convolutional networks for semi-supervised learning Adaptive graph convolutional neural networks Graph cnn for survival analysis on whole slide pathological images Graph neural network for interpreting task-fmri biomarkers Braingnn: Interpretable brain graph neural network for fmri analysis Pooling regularized graph neural network for fmri biomarker analysis Gated graph sequence neural networks Diffusion convolutional recurrent neural network: Data-driven traffic forecasting Detecting changes of functional connectivity by dynamic graph embedding learning Learning dynamic graph embeddings for accurate detection of cognitive state changes in functional brain networks Beyond covid-19 diagnosis: Prognosis with hierarchical graph representation learning Computational network biology: data, models, and applications Community-preserving graph convolutions for structural and functional joint embedding of brain networks Towards deeper graph neural networks Generative self-training for cross-domain unsupervised tagged-to-cine mri synthesis Neurophysiological investigation of the basis of the fmri signal Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops The parkinson progression marker initiative (ppmi) Cnn-gcn aggregation enabled boundary regression for biomedical image segmentation Rethinking pooling in graph neural networks Multimodal priors guided segmentation of liver lesions in mri using mutual information based graph co-attention networks Mutual information-based graph co-attention networks for multimodal prior-guided magnetic resonance imaging segmentation Encyclopedia of biomedical engineering Combining fundus images and fluorescein angiography for artery/vein classification using the hierarchical vessel graph network Hact-net: A hierarchical cell-to-tissue graph neural network for histopathological image classification Online learning of social representations Column networks for collective classification Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Region adjacency graphs and connected morphological operators Graph contrastive coding for graph neural network pre-training Weakly supervised deep nuclei segmentation using partial points annotation in histopathology images Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning Radiologist-level pneumonia detection on chest x-rays with deep learning Graph attention multi-instance learning for accurate colorectal cancer staging Graphcovidnet: A graph neural network based model for detecting covid-19 from ct scans and x-rays of chest Proceedings of the IEEE conference on computer vision and pattern recognition Imaging of the human brain in health and disease Multi-scale convolutional neural networks for lung nodule classification Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification Dynamic edge-conditioned filters in convolutional neural networks on graphs Integrating similarity awareness and adaptive calibration in graph convolution network to predict disease Contributions and challenges for network models in cognitive neuroscience Robust spatial filtering with graph convolutional neural networks Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) Structure-aware graph-based network for airway semantic segmentation Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the mni mri single-subject brain Attention is all you need Automatic ischemic stroke lesion segmentation from computed tomography perfusion images by image synthesis and attention-based deep neural networks Weakly supervised prostate tma classification via graph convolutional networks Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation Spectral pyramid graph attention network for hyperspectral image classification Heterogeneous graph attention network Cell graph neural networks enable digital staging of tumour microenvironment and precisely predict patient survival in gastric cancer Hierarchical graph pathomic network for progression free survival prediction Early detection of liver fibrosis using graph convolutional networks Learning differential diagnosis of skin conditions with co-occurrence supervision using graph convolutional networks A comprehensive survey on graph neural networks Self-training with noisy student improves imagenet classification Representation learning on graphs with jumping knowledge networks Adversarial learning with multi-scale loss for skin lesion segmentation Interpretable multimodality embedding of cerebral cortex using attention graph network for identifying bipolar disorder Gnnexplainer: Generating explanations for graph neural networks Hierarchical graph representation learning with differentiable pooling Graph contrastive learning with augmentations Multi-scale enhanced graph convolutional network for early mild cognitive impairment detection Whole-brain anatomical networks: does the choice of nodes matter? Cells are actors: Social network analysis with classical ml for sota histology image classification Linking convolutional neural networks with graph convolutional networks: application in pulmonary artery-vein separation Joint fully convolutional and graph convolutional networks for weakly-supervised segmentation of pathology images An end-to-end deep learning architecture for graph classification Graph convolutional networks: a comprehensive review Deep representation learning for multimodal brain networks Integrating heterogeneous brain networks for predicting brain disease conditions Airway anomaly detection by prototype-based graph neural network Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution Graph neural networks: A review of methods and applications A comprehensive review for breast histopathology image analysis using classical and deep neural networks Cgc-net: Cell graph convolutional network for grading of colorectal cancer histology images Dual graph convolutional networks for graph-based semi-supervised classification