key: cord-0529239-o4vjjh8r
authors: Jiang, Shunyu; Feng, Fuli; Chen, Weijian; Li, Xiang; He, Xiangnan
title: Structure-Enhanced Meta-Learning For Few-Shot Graph Classification
date: 2021-03-05
journal: nan
DOI: nan
sha: 807ba6b8245948804d21429a375e283387bc45d2
doc_id: 529239
cord_uid: o4vjjh8r

Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction.Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage.This work explores the potential of metric-based meta-learning for solving few-shot graph classification.We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers global structure and local structure of the input graph. An implementation upon GIN, named SMF-GIN, is tested on two datasets, Chembl and TRIANGLES, where extensive experiments validate the effectiveness of the proposed method. The Chembl is constructed to fill in the gap of lacking large-scale benchmark for few-shot graph classification evaluation, which is released together with the implementation of SMF-GIN at: https://github.com/jiangshunyu/SMF-GIN.

Graphs are widely used to represent relation data in plenty of real-world applications such as social networks [20] , chemical molecules [9] , and biological proteins [8] , due to their strong capability of modeling relationships and structure. A surge of attention has been focused on graph-based analytical tasks, including graph classification such as predicting the chemical properties of molecules [9] . The existing graph classification solutions largely follow supervised learning where sufficient labeled graphs are required for each class [24] . Despite the success on well-labeled classes, many real-world scenarios are faced with few-shot classes where only limited labeled graphs are available due to the high cost of labeling [2] or the occurrence of new classes [28] . For instance, in chemical-pharmaceutical industry, the requirement of predicting new molecule properties keeps occurring, e.g., predicting the inhibition against COVID-19, where the labeled molecules are very limited. To address this dilemma, it is imperative to shift the attention to few-shot graph classification.

As a few-shot classification task, a natural way is extending existing meta-learning solutions for conventional fewshot tasks such as image classification [5] to graphs by additionally considering the graph structure. Indeed, the vanilla model-agnostic gradient-based meta-learning (MAML) [6] has been extended to graph classification [14] by utilizing graph neural network (GNN) as the back-bone model. The key idea is to learn a GNN initialization that can be well generalized and fast adapted to new classes with the generalization performance as the supervision, which simply counts on the GNN to encode graph structure. In addition to the prevalence of gradient-based methods like MAML in meta- learning, there are also works that demonstrate the effectiveness of metric-based meta-learning [11] in conventional fewshot classification. Its target is to learn sample representation and distance measure that can be generalized to the new classes where the prediction is made according to the distance to the labeled samples. Along this line, Chauhan et al. propose a graph distance measure in the spectral domain, which validates the merit of considering graph structure. However, the measure in spectral domain might not well utilize the graph structure, making it inflexible and hard to be adjusted according to the graph properties. Therefore, it is worthwhile exploring meta-learning for few-shot graph classification with explicit consideration of graph structure, which has received relatively little scrutiny.

We argue that the structural information of graph data consists of two parts, namely local structure and global structure. On the one hand, some attributes of a graph depend on the substructure of the graph, that is, the local structure plays a decisive role in predicting the graph's attribute class. For example, as shown in Figure 1 (a), the structures of 2 3 and 2 3 both contain carbonate groups (i.e., 3 ) , making them exhibit similar characteristics such as the weak alkalinity when dissolved in water.

As such, the local structure should be well represented when measuring the distance of graphs. On the other hand, as GNN has become a default choice for extracting graph representations, the global structure of a graph also affects whether the distance calculated from the representations can be well generalized. As shown in Figure 1 (b), the graph representation at deep layers will be over-smoothing for 6 6 with small diameters [3] , while for 6 12 with large diameters, the graph representation at shallow layer will be insufficient for describing the overall graph [13] . As the graphs in new classes may have distinct global structure and local structure, it is even more challenging to model them both effectively.

In this work, we propose a metric-based meta-learning framework for few-shot graph classification that explicitly considers the structure characteristics of graph data when calculating graph representation. In this framework, a multilayer GNN is employed to encode the input graph where different layers capture the information at different granularities. To better account for the global structure, instead of simply aggregating (e.g., concatenating) the representations from different GNN layers, we devise an adaptive aggregation module that assigns weights for different layers in a graph-specific manner. As to the local structure, the framework further extracts the representations of representative substructures (e.g., the scaffold [23] of molecular) which are adaptively fused into the overall graph representation. In this way, the framework can achieve more accurate distance metric to enhance the graph classification on new classes. Moreover, the framework is transparent, shedding light on how the global structure and local structure effect the classification.

In particular, we implement the proposed framework over a state-of-the-art model for graph classification, Graph Isomorphism Network (GIN) [24] . We evaluate our solution, named SMF-GIN, over the existing datasets of few-shot graph classification such as TRIANGLES [2] . However, the existing datasets not only contain few classes but are also small in size, making it difficult to truly validate the performance of the model. To facilitate the research and evaluation of the few-shot graph classification task, we extract a largescale multi-class graph classification dataset in the Chembl chemical molecule database [23] , named Chembl, which is believed to be a general benchmark dataset. Extensive experiments on TRIANGLES and Chembl validate the effectiveness and rationality of the proposed method.

The main contributions of our work are as follows:

• Proposing a metric-based meta-learning solution for the few-shot graph classification task, which explicitly encodes the local structure and global structure of a graph.

• Constructing a general large-scale multi-class benchmark dataset to facilitate future research on few-shot graph classification.

• Conducting extensive experiments on two datasets to justify the effectiveness of our design.

In this section, we first describe the problem setup (Section 2.1) followed by the introduction of the proposed framework (Section 2.2). Lastly, we detail the consideration of graph local structure and global structure.

First of all, we define the problem of the few-shot graph classification, where a set of graphs { 1 ,  2 , ⋯ ,  } and their labels { 1 , 2 , ⋯ , } are given. Each graph  = ( , ) is described by an adjacency matrix ∈ ℝ × and a node feature matrix ∈ ℝ × , where denotes the number of nodes in  and is the dimension of node feature. Then, according to label , we split into {( , )} and {( , )} as train set and test set respectively. Notice that and must have no common classes. The target is to learn a classifier from {( , )} (i.e., meta-train), that can be generalized to making prediction for classes in given only a few labeled graphs (i.e., metatest).

In the meta-train and meta-test phases, we utilize the episodic method of task learning [6] in the conventional fewshot learning. Taking the meta-train phase as an example, we randomly sample meta-tasks, which contain a support set = {( , )} and a query set = {( , )}. Given labeled support data, our goal is predicting the labels of query data. Generally, there are classes in the support set, and each class has samples. This is the -way -shot graph classification problem we aim to solve. For each metatask, its support set and query set have the same classes, but the number of samples in each class is not necessarily the same.

Inspired by the metric-based meta-learning method in the field of computer vision [22] , the framework (see Figure 2) learns a GNN as an encoder to project a graph (i.e.,  ) into a latent representation = ( | ), and makes prediction based on the nearest neighbor principle by calculating distance between the query graph and the graphs in the support set according to their latent representations. For a -way -shot meta-task, by feeding the support set = {( , )} through the encoder, we obtain their latent representations, , ( ∈ [1, ], ∈ [1, ]). From the representations of the graphs in each class, we obtain a class centroid , which is formulated as:

Similarly, we extract the representation of query graphs, ( ∈ [1, ], where Q is the number of query set) of all samples.

We measure the distance between the the query graph and each class centroid. The nearest neighbor classification method is utilized to predict the label of each query graph:

where (⋅) is a distance metric such as the Euclidean distance. The parameters of the encoder are learned by optimizing a classification loss over the query graphs, which is:

where is the label of the query graph and  is selected to be the widely used cross-entropy loss.

Meta-test. Given a graph  from a meta-task in the metatest phase, instead of a direct usage of the extracted representation = ( | ). We consider a representation transformation to facilitate the generalization to new classes by centering and scaling, inspired by the superior performance of SimpleShot [22] . 1) Centering. As the encoder is trained over graphs from the training graphs, it may induce some bias into the representation of graphs from unseen classes. Therefore, we subtract the mean representation over all graphs occurred in meta-train to mitigate such representation drift. Formally,

2) Scaling. We further conduct L2-normalization over̄ so that the range of distance across different classes is close to each other. That is, we usê ←̄ ‖̄ ‖ 2 to calculate class centroid and graph distance in meta-test.

In particular, we choose GIN as the backbone of the encoder, which is one of the state-of-the-art models for graph classification. Upon the GIN model, we then detail our design for considering global structure and local structure when learn the latent representation . It should be noted that the schema can also be applied to other GNN graph classification models such as DiffPool [26] . In the following, the omit the subscript for the briefness of the notations.

As shown in Figure 3 , we devise two modules upon GIN to encode the global structure and local structure and enhance the graph representation.

Generally, if there are layers in GIN, each node will aggregate -hop neighbors, and the graph representation at each layer { 1 , 2 , ⋯ , } are realized by a READOUT function such as average pooling, which aggregates the node representations. By concatenating the representation at each layer, we obtain the entire graph representation. Formally,

Intuitively, for graphs with different global structure, the representation at different layers should be highlighted differently rather than simply treated equally. For instance, the representation of  should highlight the representation from shallow layers to avoid the impact of over-smoothing.

In order to make full use of the global structure, we use a global structure attention to learn { 1 , 2 , ⋯ , } to model the importance of different layers:

In this way, the representation from a layer adaptively contributes to the representation of different graphs. In Section 2.3.3, we detail the implementation of the global structure attention. Note that we follow the previous GNN models for graph classification, which typically uses concatenation to aggregate the representations across difference layers.

We believe that the attribute characteristics of graph data depend on the substructure characteristics. In other words, the local substructure in the graph plays a decisive role in the label prediction of the entire graph. In many domains, the crucial substructure can be recognized according to domain knowledge such as the scaffold of molecular 1 .

Each layer embedding Input graph (a) Considering global structure in the GIN encoder. Let  =  1 , ⋯ ,  denote the crucial substructures extracted from graph . For each substructure, e.g.,  1 , we use the same GIN encoder to calculate its latent representation at each layer and obtain the substructure representation  1 according to Equation 6 2 . Considering that substructure can contribute unequally to the prediction of different graphs, we devise a local structure attention to model the weight of substructure and obtain the final representation of the graph . Formally,

{ 0 , 1 , ⋯ , } is calculated by an attention model (see Section 2.3.3). If not otherwise specified, we aggregate the entire graph representation  and the substructure representations through mean-pooling, which is a common setting in the previous usage of attention model [26] and is sufficient for distilling the important signals for making classification (see detailed results in Section 3.3.4).

For both the global structure attention and the local structure attention, the target is to learn a group of weights (i.e., and ) with a set of representations as inputs, and aggregate the representations according to the learned weights. Considering that graphs changes dramatically across different domains, we try five different attention models to calculate the weights. Without loss of generality, we elaborate the attention models with the global structure attention as an example where the inputs are  = { 1 , 2 , ⋯ , }.

treating the target weights as additional model parameters directly which are updated as normal parameters during model training. Formally, we introduce a learnable weight vector = [ 1 , 2 , ⋯ , ]. As the weights are shared across different graphs, this method is suitable for scenarios where the global structure properties are consistent across graphs, especially the ones without enough labeled data since the minimal number of additional model parameters.

Vanilla attention. Inspired by the success of the vanilla attention model [4] , it can be a better choice for implementing our attentions, which is formulated as:

where , , and are model parameters to be learned.

Self-attention. Furthermore, to distinguish the different impact among representations, we further referenced the multi-head self-attention mechanism [18] . We follow the standard implementation of multi-head self-attention [18] where the input representations are fed into the model as a sequence. Note that the learned weights have been absorbed into the output representations.

MLP. Similarly, we can embed the weight into a Multi-Layer Perceptron (MLP) to learn the aggregation of the input representations, which is formulated as:

where is the number of MLP layers, (0) = () is a concatenation of the input representations, and ( ) denote the output representation.

Transformer. To further enhance the representation ability, we can also implement the attentions by a Transformer layer, which is combination of multi-head self-attention and MLP. We refer the original paper [18] for the detailed formulation of a Transformer layer.

On the basis of these five different calculation methods, we consider the characteristics of both global structure and local structure, and implement our framework with global structure attention and local structure attention simultaneously, named SMF-GIN. In particular, SMF-GIN first calculates the representation of the entire graph and each substructure according to Equation 6 , and then calculates the final representation for making classification according to Equation 7 .

Moreover, we also study three variants of SMF-GIN:

• SMF-GIN-, which only includes global structure attention, i.e., taking the output of Equation 6 as the final representation.

• SMF-GIN-, which only considers the characteristics of local structure. As compared to SMF-GIN, SMF-GINcalculates the representation of the entire graph and each substructure according to Equation 5 .

• SMF-GIN-. Lastly, we ensemble SMF-GIN-and SMF-GIN-, named SMF-GIN-, through a voting mechanism.

In the meta-test phase, SMF-GINaveragely aggregates the prediction of a query graph (i.e., distance to each class centroid) from SMF-GINand SMF-GIN-.

In this section, we will introduce experiment datasets, baselines and details of our methods for implementation. In our experiments, we focus on four research questions: (1) Could our framework SMF-GIN, considering the characteristics of both global structure and local structure, achieve better results? (2) To what extent, the consideration of global structure and local structure improve the classification results? (3) What are the specific performances of different attention models in SMF-GIN-and SMF-GIN-? (4) What is the impact of hyper-parameters on SMF-GINand SMF-GIN-? In response to these four questions, we conduct experiments and answer these questions in the analysis of experimental results.

Datasets. We conduct experiments on two datasets: the multi-class Chembl dataset and public dataset TRIANGLES. Detailed information about these two datasets illustrated in the Table 1 .

Chembl. In order to promote the few-shot graph classification investigation and evaluation, we constructed a general large-scale multi-class graph classification benchmark dataset from the Chembl chemical molecule database [23] , which contains approximately 1.88 million molecules and 12,482 molecular targets. Specifically, select the target type as protein, the target confidence score is higher than 0, and each molecule contains only one target. For better training, select the target with more than 500 corresponding molecules. Take the final 273 molecular targets as the categories of our dataset, and randomly select 500 samples from each category as the final Chembl dataset, a total of 136500 molecules. TRIANGLES. This dataset contains ten different graph classes numbered from 1 to 10, corresponding to the number of triangles in each graph in the dataset. Each class of this dataset contains 201 graphs, for a total of 2010 graphs.

Baselines For the N-way K-shot graph classification task, we utilize 5 state-of-the-art baseline methods to compare with our methods: Pre-GNN, GIN, MAML, Supclass and SimpleShot. Detailed information about these 5 types of baselines is included:

• Pre-GNN [9] : This method pre-trains the expressive GNN of a single node and an entire graph so that GNN can learn useful local and global representations at the same time. We utilized three pre-trained models: Pre-context, Pre-masking, and Pre-infomax. In the test phase, N-way K-shot tasks were used to fine-tune these three pre-trained models.

• GIN [24] : This baseline method, follows a neighborhood aggregation strategy, where we iteratively update the representation of a node by aggregating representations of its neighbors. In the training phase, we utilize all categories' train set graphs to train the encoder and classification layer. In the test phase, we discard the trained classification layer and retain the encoder. Then use the N-way K-shot task to fine-tune the encoder and the new classification layer.

• MAML [14] : We apply gradient-based meta-learning method [6] to GNNs model. More specifically, we leverage GIN [24] as the graph encoder backbone and metalearning as a training paradigm to capture task-specific knowledge rapidly in the graph classification tasks and transfer them to new tasks.

• Supclass [2]: This method was proposed based method customized for the few-shot graph classification. On the training stage, they compute prototype graphs from each class, then they cluster the prototype graphs to produce superclasses. After that, they predict the origin class and superclass of each graph. On the test stage, they only update the classifier based on the classification of origin classes.

• SimpleShot [22] :

We apply the metric-based meta-learning method to the GNNs model without considering graph structure characteristics. Specifically, we utilize GIN as the encoder to find the class representation in support set, and then build a metric space shared with the query set. Perform transformation Table 2 Accuracies with a standard deviation of baseline methods and our framework. We tested 1000 and 500 N-way-K-shot tasks on Chembl and TRIANGLES respectively. The bold black numbers denote the best results we get.

Chembl TRIANGLES 5-way 5-shot 3-way 5 Experimental implementation details. On account of the number of categories between the chembl dataset and the TRIANGLES dataset is too different, we perform 5way 5-shot classification task on the Chembl dataset and 3way 5-shot classification task on the TRIANGLES dataset, where the number of samples in each class in the query set is 15. The split of the train set, validation set and test set is illustrated in Table 1 . We utilize the 5-layer GIN model as the encoder (each layer contains 2 layers of MLP). Among them, we CONCAT the graph representation in the last 4 layers as the real graph representation, because the first layer is the original node feature. We set the dimension of the hidden layer to 64 and the learning rate to 0.001. During the training process, a total of 700 iterations are performed, and validation is performed every 20 epochs. The model with the best verification results is selected for testing to obtain the final classification result. We implement the proposed solution on Pre-GNN 3 through PyTorch 1.0.1. As to the substructure, we adopt the scaffold of the molecular in the Chembl dataset, and a random split of the graph in the TRIANGLES dataset.

Our experimental results are illustrated in Table 2 . Pre-GNN is not suitable for the TRIANGLES dataset, because this model is designed to process chemical molecule and biological protein graph so that it could only process the Chembl dataset. We can observe that three methods of Pre-GNN perform badly on the N-way K-shot classification task. Here, we believe that there is a great difference between the dataset used in Pre-GNN and the graph structure of the dataset we finally test, so even fine-tuning the model on the N-way K-shot task has not achieved good results.

Regarding the utilization of MAML in GNN to solve the few-shot graph classification task, in the Chembl dataset, MAML is not as effective as simply using GIN, while in the TRIANGLES dataset, the result is just the opposite. The reason for this phenomenon is the different structure of the graphs, the differences between the Chembl dataset categories are large, whereas the TRIANGLES dataset represents only the number of triangles in the graph, with a small structural difference. This is consistent with the general interpretation of MAML: the generalization ability is poor when there are large differences in sample data. In both Chembl and TRIANGLES datasets, the performance of MAML is higher than Supclass, which is also consistent with the result in [14] .

In all models, without considering structural characteristics, the best performance is SimpleShot, the application of the metric-based meta-learning method to the GNNs model. It can be seen that it is effective to share a metric space between the graph data for N-way K-shot classification task. For our framework: SMF-GIN, the performance is superior to that of baselines except SimpleShot because package the characteristics of both global structure and local structure into an end-to-end solution would interfere with the training process. Therefore, the answer to our first question is yes.

For the second question, we also implement SMF-GIN variants: SMF-GIN-and SMF-GIN-, which includes global structure attention and local structure attention, respectively. we can make the following observation: compared with the baseline and SMF-GIN, SMF-GIN-and SMF-GIN-both achieve better performance, which shows that it is easier to capture graph structural characteristics when considering local structure or global structure separately.

At the end of the previous section, we ensemble SMF-GIN-and SMF-GIN-, named SMF-GIN-through a voting mechanism. Here we consider two ways of aggregating: SMF-GIN-(self) and SMF-GIN-(best). In particularly, SMF-GIN-(self) refers to the results of using Self-attention mechanism in both SMF-GIN-and SMF-GIN-, while SMF-GIN-(best) refers to the results of adopting the optimal method in both SMF-GIN-and SMF-GIN-. As shown in Table 2 , SMF-GIN-(best) gets the best result and SMF-GIN-(self) gets the second best result. This shows that it is important to make full use of the specific structure of graph in the few-shot graph classification task.

In the previous section, we put forward five different attention models to both the global structure attention and the local structure attention. In this section, we take the TRIANGLES dataset as an example and compare the performance difference of five attention models on SMF-GINand SMF-GIN-through experiments. For the selfattention mechanism, we have studied the influence of some hyper-parameters and pooling strategies on the final result.

As illustrated in Figure 4 , we can see the performance of these different attention models on SMF-GIN-and SMF-GIN-.

In SMF-GIN-, the vanilla attention omits the interactions across different layers' representation, so the generalization ability for each layer representation is relatively poor. Simply using learned-weight or MLP layers to accommodate each layer's representation is not as effective as self-attention and Transformer mechanisms. Because shallow representation (e.g., at the first layer) and deep representation (e.g., at the last layer) have different importance for different graphs, and self-attention and Transformer mechanisms can further enhance the representation ability and distinguish the different impact among representations well.

In SMF-GIN-, due to the poor generalization ability of the vanilla attention mechanism for representations of the entire graph and substructures, the result is far lower than the other attention models, and its performance cannot be illustrated in Figure 4 . Simply using learned-weight is difficult to fit different substructures, more parameters are conducive to the entire graph and substructures' representation in SMF-GIN-, so that the effectiveness of MLP method is higher than self-attention and Transformer mechanisms.

As illustrated in Figure 5 , we compare layer=1 and layer=2 in self-attention mechanism on head={1, 2} in SMF-GIN-and SMF-GIN-respectively. According to Figure 5 , we observe that the effect of layer depth on model performance is the same in SMF-GIN-and SMF-GIN-, When layer=1, the performance both are higher than layer=2 whether the head number is 1 or 2. The reason is that in the self-attention mechanism, representations of the entire graph and substructures and the encoder each layer representations needn't deep aggregation, and shallow layer can be expressed well. 

Comparing the impact of layer depth on performance in Figure 5 , we found that when the head numbers increases, performance in SMF-GIN-and SMF-GIN-is greatly improved. Therefore, we fixed layer=1, tried to change head numbers, and observed its performance trend. As illustrated in Figure 6 , with the increase of head numbers, there is the same change trend in SMF-GIN-and SMF-GIN-, and the performance reaches its peak when head=2. Overall, when head≤8, the fluctuation of SMF-GIN-is larger than SMF-GIN-, indicating the different layers' representation in SMF-GIN-are more sensitive to head numbers. When head>8, the performance drops slowly, indicating that larger head numbers are harmful to this model.

Recall that we take mean-pooling as a default choice to aggregate the entire graph representation and substructure representations (c.f., Equation 7). We then investigate the impact of different pooling strategies on the final prediction results. We fix previously analyzed hyper-parameters as the best choice, that is { = 1, ℎ = 2}, and selected max-pooling, mean-pooling and first-pooling, where firstpooling represents the first dimension of the output directly, that's the entire graph's representation  . As illustrated in Figure 7 , we can see that max-pooling has the best performance, mean-pooling is the second, and first-pooling is the worst. Because max-pooling is more like making the best feature selection compared to others, selecting the representation with higher classification recognition, providing non-linearity and the highest performance naturally. 

In this section, we briefly introduce the relevant research lines of our work: graph classification, few-shot learning and graph few-shot learning.

Graph Neural Networks (GNNs) were first introduced in [7] . This algorithm follows the neighborhood aggregation framework, computes the representation vector of the node by the recursive aggregation and transformation of adjacent nodes feature vector. As the main branch in recent years, GNNs have been successfully applied to graph classification. [10] proposed Graph Convolutional Neural Network (GCN) and achieved inspiring results based on feature aggregation from neighborhood nodes. [19] introduced attention mechanism for graph convolutional operations (GAT). [26] proposed DIFFPOOL, a differentiable graph pooling module that can generate hierarchical representations of graphs and can be combined with various graph neural network architectures in an end-to-end fashion. [24] developed a simple Graph Isomorphism Network (GIN) that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test.

When these graph classification methods encounter novel classes, the existing GNNs models always need to re-learn their parameters to incorporate the new information, and if there are few samples of data in each new class, the performance of the GNNs model will suffer a catastrophic decline. However, our framework uses a metric-based meta-learning method with GIN as the model, it can achieve good results when there are few novel class data samples.

Few-shot learning in the computer vision community was first introduced by [12] . Its goal is to generalize a model to a new task rapidly with only a small number of samples after learning a large amount of data with a certain category. Various learning algorithms have been proposed in the image domain, and recent related works can be classified into two categories: gradient-based methods and metricbased methods.

Gradient-based methods These methods aim to learn a better initialization of model parameters that can be updated by a few gradient steps in future tasks to fast adapt to new tasks [6] [17] , or train parameter generator for task-specific classifier [15] .

Metric-based methods The purpose of these methods is to learn a metric space from the training task. When a new task appears, this metric space is shared with the new task [21] [16] .

These methods have achieved inspiring results in image classification tasks, but they have not been well explored in graph data. In contrast, our framework adopts the metricbased method and performs well in the graph classification task.

Based on this challenge, recently many scholars have carried out researches on few-shot learning in graph data. [27] solved the sparsity problem by adding regularization in the loss function of the graph neural network in the few-shot node classification task. [25] proposed a graph few-shot learning (GFL) algorithm based on the prior knowledge of auxiliary graphs, which transferred the knowledge learned from the auxiliary graphs to the new target graphs to improve the effectiveness of semi-supervised node classification. [1] adopted the traditional meta-learning method to optimize the shared parameter initialization of the local link prediction model. Currently, these few-shot learning methods are widely used in node classification and link prediction tasks, but not suitable for graph classification tasks. [14] adopted gradient-based meta-learning methods to deal with the fewshot graph classification task, updated the initialization parameters of the model according to the different meta-task in order to fast adapt to new tasks. [2] proposes a graph distance measure in the spectral domain, which validates the merit of considering graph structure.

However, these methods focus on algorithms and models without utilizing the characteristics of graph data structure well. In order to make up for this shortcoming, we put forward our framework under the premise of considering the structure of graph data explicitly.

This paper highlights the structure-enhanced meta-learning for few-shot graph classification.

We proposed a framework, named SMF-GIN, for the N-way K-shot graph classification problem. It uses the GIN as an encoder to learn graph representations, where two attention mechanisms are designed to encode the global structure and local-structure. Five attention models are tested under the proposed framework. Extensive experimental analysis shows that for few-shot learning on the graph, it is more important to make full use of the structural characteristics of graphs. Meanwhile, a large-scale benchmark dataset is released to facilitate future research. In the future, we will explore the consideration of more properties of graph structure in the few-shot setting. In addition, we will test different backbone models such as DiffPool [26] and GAT [19] . Furthermore, we will investigate more distance measures for graphs.

Metagraph: Few shot link prediction via meta learning

Few-shot learning on graphs via super-classes based on graph spectral measures

Measuring and relieving the over-smoothing problem for graph neural networks from the topological view

Semi-supervised user profiling with heterogeneous graph attention networks

A baseline for few-shot image classification

Model-agnostic meta-learning for fast adaptation of deep networks

A new model for learning in graph domains

Inductive representation learning on large graphs

Strategies for pre-training graph neural networks

Semi-supervised classification with graph convolutional networks

Siamese neural networks for one-shot image recognition

One-shot learning of object categories

Deepgcns: Can gcns go as deep as cnns?

Few-shot graph classification with model agnostic meta-learning

Meta-learning with latent embedding optimization

Prototypical networks for few-shot learning

Meta-transfer learning for few-shot learning

Graph attention networks

Deep graph infomax

Matching networks for one shot learning

Simpleshot: Revisiting nearest-neighbor classification for few-shot learning

Moleculenet: A benchmark for molecular machine learning

How powerful are graph neural networks?

Graph few-shot learning via knowledge transfer

Hierarchical graph representation learning with differentiable pooling

Few-shot classification on graphs with structural regularized gcns

Meta-gnn: On few-shot node classification in graph metalearning

This work is supported by the National Key Research and Development Program of China (2020AAA0106000) and National Natural Science Foundation of China (U19A2079).