key: cord-0134721-l5p4fg7p authors: Apicella, Andrea; Isgro, Francesco; Pollastro, Andrea; Prevete, Roberto title: Adaptive Filters in Graph Convolutional Neural Networks date: 2021-05-21 journal: nan DOI: nan sha: 6c2b055238717bb37145b140511c0c595dd27854 doc_id: 134721 cord_uid: l5p4fg7p Over the last few years, we have witnessed the availability of an increasing data generated from non-Euclidean domains, which are usually represented as graphs with complex relationships, and Graph Neural Networks (GNN) have gained a high interest because of their potential in processing graph-structured data. In particular, there is a strong interest in exploring the possibilities in performing convolution on graphs using an extension of the GNN architecture, generally referred to as Graph Convolutional Neural Networks (ConvGNN). Convolution on graphs has been achieved mainly in two forms: spectral and spatial convolutions. Due to the higher flexibility in exploring and exploiting the graph structure of data, there is recently an increasing interest in investigating the possibilities that the spatial approach can offer. The idea of finding a way to adapt the network behaviour to the inputs they process to maximize the total performances has aroused much interest in the neural networks literature over the years. This paper presents a novel method to adapt the behaviour of a ConvGNN to the input proposing a method to perform spatial convolution on graphs using input-specific filters, which are dynamically generated from nodes feature vectors. The experimental assessment confirms the capabilities of the proposed approach, which achieves satisfying results using a low number of filters. In the last few decades, Convolutional Neural Networks (CNNs) have gained much interest due to their potential and versatility in addressing a large scale of machine learning problems (LeCun et al., 1998 Hesser et al., 2021) , such as image processing (Krizhevsky et al., 2012; Yu et al., 2021a; Yang et al., 2021) and pattern recognition (Kim, 2014; Cui et al., 2021; Lyu et al., 2021) , while achieving great success. The potential of CNNs lies in extracting and processing local information performing convolution on input data using sets of trainable filters with a fixed size. However, the design of the convolution operation in the CNNs allows to process only regular data while, in the real world, there is a considerable amount of data that naturally lie on non-Euclidean domains, needing different techniques to be processed. These data are often represented by graph-based structures. Graph structures imply several difficulties in using standard data processing techniques, such as the impossibility of using classic CNNs due to the variable number of neighbours for each node (differently from regular data where the filter properties fix the number of neighbours for each node). This aspect has led to new processing techniques such as Graph Neural Networks (GNNs), which gained high interest during the last years. First attempts (Sperduti and Starita, 1997; Gori et al., 2005; Scarselli et al., 2008) of neural networks based on input graphs, generally referred to as Recurrent Graph Neural Networks (RecGNNs), were based on message passing architectures, where an iterative process allows to learn, for each node, a representation of the relative neighbourhood information. Therefore, the learned node representations are used for classification or regression tasks. However, as the size of the graphs increased, these approaches were more and more computationally expensive, and this represented a new challenge to overcome. Due to the great success of CNNs, GNNs inherits convolution operation producing the Graph Convolutional Neural Networks (ConvGNNs), which have found their expression in two different approaches. The former are spectral methods (see, for example, (Li et al., 2018b; Levie et al., 2018) ), that perform convolution based on graphs signal processing techniques. The latter are spatial methods (see, for example, (Atwood and Towsley, 2016; Hechtlinger et al., 2017) ), that instead perform convolution using spatial information of data, similarly to what classical CNNs do. ConvGNNs share the same idea of message passing with RecGNNs but in a non-iterative manner. However, ConvGNNs are usually based on learned filters having constant values for each input fed to the network as well as classical convolutional networks. However, ConvGNNs are usually based on learned filters having constant values for each input fed to the network as well as classical convolutional networks. In other words, the filter values are independent of the input values. However, we note that adapting the Artificial Neural Network (ANN) inner behaviour in function of the input is an open research area in the scientific community. In a nutshell, while in standard approaches, the ANN input-output relationship, i.e., the ANN behaviour, after the training phase, is completely defined by a set of fixed network parameters (weights and biases). By contrast, the core idea of this research area is that the ANN behaviour also depends on the input itself or additional inputs. A way to achieve this goal is setting the network parameters by another neural network which receives the same input of the former neural network or external/additional inputs (see, for example, (Donnarumma et al., 2012) and the hypernetworks proposed in (Ha et al., 2017; Sun and Lee, 2017) ). Thus, ANN is able to dynamically change its behaviour according with the received inputs. In this paper, we refer to this type of approaches as Dynamic Behaviour Neural Networks (DBNN). In the last years, several works were proposed following DBNN approach (Ha et al., 2017; Sun and Lee, 2017; Von Oswald et al., 2019) . However, in the GNN field, to the best of our knowledge, the DBNN approach has not received too much attention. In significant research work (Simonovsky and Komodakis, 2017) the authors proposed the Edge-Conditioned Convolution (ECC) network, which performs spatial convolutions over graph neighbourhoods exploiting edge labels and generating input-specific filters from them. In (Nachmani and Wolf, 2019) , the authors proposed a method to make the GNN message passing architecture adaptive to input nodes using an external hypernetwork. Zhang et al. in (Zhang et al., 2018) proposed the Graph HyperNetwork (GHN), a model having weights generated by an hypernetwork exploiting a computation graph representation. In (Balažević et al., 2019) , an hypernetwork architecture is proposed to generate relation-specific convolutional filters for convolution on graphs. In this paper, we exploit the possibility of dynamically changing the convolutional filter behaviour as a function of the input and propose a novel method to perform spatial convolution on graph-structured data. We will name our approach Dynamic Graph Convolutional Filters (DGCF). Following (Jia et al., 2016; Simonovsky and Komodakis, 2017) , convolutional filters will be generated using an external module, the filter-generating network, that, during the training stage, learns to produce input-specific filters in order to perform an ad-hoc filtering operation for each input sample. Differently from (Simonovsky and Komodakis, 2017) , in our approach filters are generated exploiting nodes feature vectors instead of edge labels. Our approach is validated in three series of experiments, making a comparison with the standard convolution over 10 repetitions, with randomly initialized weights for each repetition. Hypothesis tests are reported for each series of experiment to verify the significance of the results. The advantages of the proposed approach can be summarised as follows: • It inherits the standard convolution operation from classical CNNs. The convolution is performed on each node with its nearest neighbours applying sets of fixed-sized filters. • The network inner behaviour changes according to the input. Filters used for the convolution are dynamically generated using an external module based on the input graphs. • Dynamic behaviour of the convolutional filters can lead to design simpler architectures than non-dynamic approaches with respect the number of convolutional filters, while leaving unaltered performances. Promising results can be achieved using a fewer number of convolutional filters than a non-dynamic approach. • Training convergence can be reached in a fewer number of epochs. Using a dynamic approach, we empirically show that the learning stage needs a fewer number of epochs than using the non-dynamic approach. This paper is organized as follows. Section 2 briefly reviews the related literature; Section 3 describes the proposed method; the experimental assessment is described in Section 4 while in Section 5 the obtained results are presented and discussed. In Section 6 a visual analysis of the training stage of our proposal is shown. The concluding Section 7 is left to final remarks. In this section, we first report the related works in the context of DBNN approach and, then, we give a general description of the Graph Neural Networks, focusing on ConvGNNs. The idea of controlling the behaviours of an ANN through the input itself or an additional/external input has a long history in the literature (Jordan, 1990; Schmidhuber, 1992; Nishimoto et al., 2008; Paine and Tani, 2004; Noelle and Cottrell, 1995; Siegelmann, 2012; Eliasmith, 2005; Bishop, 1994; Donnarumma et al., 2012) . For example, in (Paine and Tani, 2004 ) a set of external neural units, called control neurons, are bidirectionally connected to all the neurons belonging to a lower layer network modulating their functions and favouring the generation of particular motor primitives. In (Bishop, 1994; Bishop et al., 1995) , the authors describe a way to represent general conditional probability densities by considering a parametric model for the distribution expressed as a neural network whose parameters are determined by the outputs of another neural network having the same input. In (Ha et al., 2017) the filters of CNNs and LSTMs networks are generated by an auxiliary network. In (Jia et al., 2016) the authors defined the dynamic changes in ANNs' behaviours in the context of traditional CNNs using a proposed dynamic filter module to execute the convolution operation. Dynamic filter module consists of two parts: a filter-generating network, that generates filters' parameters from a given input, and a dynamic filtering layer, that applies those generated filters to another input. In particular, the dynamic filtering layer can be instantiated as a dynamic convolutional layer, wherever the filtering operation is translation invariant. In (Jia et al., 2016) , considering two input images I A and I B , not necessary different, the filter-generating network takes as input I A and outputs filters F θ to apply on I B . Filters F θ are parameterized by parameters θ. In this way, an output G = F θ (I B ) is generated. However, this method is developed in the context of classic CNNs; by contrast, in (Simonovsky and Komodakis, 2017) , the authors attempt to perform a dynamic spatial convolution on graphs. The authors proposed the Edge-Conditioned Convolution (ECC), which uses a filter-generating network to output edge-specific filters for each input sample dynamically. The DGCF approach propsed in this paper is inspired by the work in (Jia et al., 2016; Simonovsky and Komodakis, 2017) . We perform a convolution on input graphs using filters that are dynamically generated by a filtergenerating network, thus obtaining a dynamical change in the behaviour of the ConvGNN. We point out that, differently from the ECC proposed in (Simonovsky and Komodakis, 2017) where convolutional filters are edge-based, our strategy (see Section 3) considers node-based filters, tweaking in this way the filtering operation on nodes by nodes themselves. Thus, summarizing, the proposed approach is different from ECC with respect to the input fed to the filter-generating network, and it differs from the other DBNN approaches since it is applied on graphs. GNNs are showing positive effects in several applications, as for example road speed detection (Lu et al., 2020) , molecular generation for drug discovery (Bongini et al., 2021) , Point of Interest recommendation and others. In (Yu et al., 2021b) the authors showed how GNNs can be useful in the COVID19 diagnosis exploiting underlying relations in chest images. A first proposal of a neural network model for graph-structured data was made in (Scarselli et al., 2008) . This model builds on the idea that graph nodes represent "concepts" related to each other via edges. Each node n is represented by a feature vector x n and each edge (i, j) is described by a feature vector x e (i,j) . This model leverages information exchange among nodes and their neighbours to update their features iteratively (message passing mechanism). In the literature, iterative graph processing techniques based on neural networks with a message passing architecture are generally referred as RecGNNs. To face the computational costs of these methods, several kinds of neural network models were proposed in the literature, often with the aim of generalising classical and established neural networks data processing to graph data, such as (Goodfellow et al., 2014; Liu et al., 2021; Bahdanau et al., 2014; Velickovic et al., 2017) . In (Wu et al., 2020) a comprehensive survey on the topic is proposed. A particular focus was given in the literature to perform the convolution operation on graph-structured data. Graph Convolutional Neural Networks (ConvGNNs) share the idea of message passing adopted by RecGNNs but implement it in a non-iterative manner: information is exchanged between neighbours using different convolutional layers, each with different filters (Wu et al., 2020) . However, the non-Euclidean characteristics of graphs (e.g., their irregular structure) makes the convolution and filtering operations not easy to define as for those on images. For this reason, in the past decades, researchers have been working on how to conduct convolution operations on graphs using several approaches, that can be categorized in: • spectral approaches, that rely on the graph spectral theory, involving graph signal processing, such as graph filtering and graph wavelets (see, for example, (Henaff et al., 2015; Li et al., 2018b; Levie et al., 2018) ); • spatial approaches, that leverage on structural information to perform convolution, such as aggregations of graph signals within the node neighbourhood (see, for example, (Atwood and Towsley, 2016; Hechtlinger et al., 2017) ). Although spectral architectures have been explored successfully in several works such as (Henaff et al., 2015; Li et al., 2018b; Levie et al., 2018) , one of the main problems of ConvGNNs in the spectral domain is that the graph structure has to be set for all the inputs due to the use of the graph Laplacian in the training stage. However, spectral analysis is computationally expensive, limiting the concrete usage of this methods on huge graphs (Balcilar et al., 2020) . Although strategies to use ConvGNNs with different inputs graph structures were proposed (Li et al., 2018b) , this problem is generally not present in the spatial domain. For these reasons, several spatial domain methods have been proposed in the literature. For example, in (Niepert et al., 2016) the authors present PATCHY-SAN, a ConvGNN model inspired by the classical image-based CNN. In (Atwood and Towsley, 2016) and (Hechtlinger et al., 2017) two different methods to generalise the convolutional operator using random walks for neighbourhood locating were reported. In (Fu et al., 2019) spectral-based GNNs are generalized to work on data structured as hypergraphs instead of classical graphs. In (Wu et al., 2019 ) the complexity of a GNN was reduced, collapsing the network layers into a single linear transformation. In the authors proposed a method to learn or refine the graph structure together with the network parameters. The aim of the work presented in this paper is to perform an adaptive spatialconvolution, using fixed-sized filters, on graph structured data. According to the dynamic convolutional layer proposed in (Jia et al., 2016) , in our approach, for each sample, a translation invariant set of filters is generated by a filter-generating network and shared among all the neighbourhoods. In this work, we propose a ConvGNN-based architecture whose convolutional filters change in function of the input features. We name them Dy-namic Graph Convolutional Filters (DGCF). Differently from similar works as (Simonovsky and Komodakis, 2017) , where filters depend on the graph edges, we propose a DBNN approach based on the graph nodes' features. In this section, after a brief introduction of graphs' notation, a detailed description of our proposal is given. Let G = (V, E) be an undirected or directed graph where V is a finite set of N nodes, and E is a finite set of edges. We define in boldface x i ∈ R 1×J the input feature vector related to the node i ∈ V , where J is the number of input channels, and y i ∈ R 1×M its output feature vector, where M is the number of output channels. Let X ∈ R N ×J denote the matrix representation of an input graph as an embedding of the feature vectors of its nodes. In order to obtain neighbourhoods with a sufficient number of nodes to which apply a filter of dimension K, we select the k-nearest neighbours of each node using the classical shortest path distance (Buckley and Harary, 1990) . Neighbourhoods are uniquely defined for each node. This work aims to perform convolution on graphs using dynamically generated filters conditioned on a given input. As we have described above, otherwise from ECC in (Simonovsky and Komodakis, 2017) , where convolution is performed using dynamical edge-based filters, our intent consists in using node-based filters dynamically generated from nodes' feature vectors. Considering the matrix representation X of an input graph, using a neural network h θ (·), that we will refer as filter-generating network, with parameters θ, we can generate a set of node-specific filters F = h θ (X) used to compute a dynamic convolution on input graphs. F can be represented as a matrix F ∈ R J×K×M , where J is the number of input channels, K is the filter size and M is the number of output channels. Supposing to compute the m-th output channel of the node n, what we propose can be formalised as follows: where s(n, k) returns the index of the k-th neighbour of n, F jkm are the filters generated by the filter-generating network h θ (·). In other words, the F jkm is the value in position (j, k) of the m-th filter generated by the filter-generating network h θ (·). Thus, during the training stage, the parameters θ have to be learned, together with the other network's parameters. According to what is described in (Jia et al., 2016) , our approach follows the dynamic convolutional layer : the filter-generating network, defined as h θ : R N ×J −→ R J×K×M , where N is the number vertices, J is the number of input channels, K is the filter size and M is the number of the output channels, takes as input the the input graph and generates a unique set of filters shared among all the neighbourhoods. As we said above, we will refer to our proposal as Dynamic Graph Convolutional Filters (DGCF). A graphical representation of the DGCF layer is shown in Figure 1 : supposing to have a node, labelled as 0, as a central node during the convolution operation on a given input graph, an input-specific set of filters is firstly generated by the filter-generating network using the nodes' feature vectors of the input graph, then it is applied to the neighborhood of the node 0, referred to as N (0), computing a new representation of it. This procedure is then iterated over all the nodes of the input graph. From the experimental results, as reported in Section 5, emerged that the use of few dynamic convolutional filters DGCF leads to results comparable with traditional convolutional architectures composed of an higher number of filters. Figure 1 : A graphical description of the DGCF layer in processing a node -marked in red and labeled as 0 -using filters of dimension K = 5. A unique set of filters dependent on all the nodes' feature vectors of the input graph is dynamically generated and shared among all of its neighbourhoods. Finally, input feature vectors of the neighbourhoods are weighted summed to compute the nodes' outputs. In this example, input-specific filters are applied on the neighbourhood of the node 0, referred as N (0). The DGCF approach was experimentally evaluated in three series of experiments. We point out that our interest in these experiments is investigating the advantages of using our method to non-dynamic approaches in terms of results, learning time, and model simplicity. At first, we conducted preliminary experiments on the well-known benchmark dataset MNIST, representing images in terms of graphs (see, (Hechtlinger et al., 2017) ) Then, we conducted a series of experiments on the 20NEWS (Joachims, 1996) , a commonly used dataset in the ConvGNN literature. Finally, we conducted a series of experiments on an widely used electroencephalographic (EEG) signals dataset, SEED (Zheng and Lu, 2015) , in order to test our method on harder tasks, such as emotion recognition from EEG signals. On this last task, a model analysis was made in order to investigate the functioning of the DGCF layer. In particular, a comparison between the learning processes (in terms of weights updates for each epoch) between the static filters and the filters generated by the filter-generating network was carried out. Note that, in each series of experiments we selected a nondynamic ConvGNN reported in the literature, trained on the same dataset we used. For each series of experiments we selected a reference paper using the same dataset from the literature on non-dynamic ConvGNN, and used the same topology and experimental setup reported in the paper. We used the widely known MNIST handwritten digits dataset for running the first series of experiments. The MNIST dataset consists in 70000 grayscale images of handwritten digits. This dataset was reported already split in training (60000 samples) and test (10000 samples) sets. So, recognizing each digit can be viewed as a 10-classes classification problem. We adopted the same experimental setup on the MNIST dataset used in (Hechtlinger et al., 2017 ) that can be resumed as follows: after the exclusion of constant pixels, the correlation matrix C between pixels is computed to estimate their relationships. Therefore, two pixels (nodes) i, j are considered connected if |C ij | > T where T is a fixed threshold. The 20NEWS dataset consists of 17236 text documents, labeled using 20 classes. The original split in training set (10167 samples after the preprocess-ing) and test set (7069 samples after the preprocessing) was adopted in this work. This task can be defined as a 20-classes classification problem. Following the preprocessing described in (Defferrard et al., 2016; Zhang et al., 2019) , we considered only the 10000 most frequently used words, considering the bag-of-words model to represent each document. We used word2vec (Mikolov et al., 2013) to represent each word as a vector. Finally, the cosine similarity metric between words vectors was adopted to compute the connections between words. The SEED dataset (Zheng and Lu, 2015) consists of EEG signals recorded from 15 subjects while they were watching video clips of about 4 minutes. Each video clip was carefully chosen to induce three types of emotions, i.e. negative, neutral and positive. For each subject, three sessions of 15 trials/video clips were collected. EEG signals were recorded in 62 channels using the ESI Neuroscan System 1 . As in (Zhong et al., 2020) , we consider the pre-computed differential entropy (DE) features smoothed by linear dynamic systems (LDS). DE features are pre-computed, for each second, in each channel, over the following five bands: delta (1-3 Hz); theta (4-7 Hz); alpha (8-13 Hz); beta (14-30 Hz); gamma (31-50 Hz). Samples were modeled as graphs considering each EEG channel as a node, and the DE features related to the 5 bands as feature vector of each node. The adjacency matrix A ∈ R n×n was modeled considering the EEG channels disposition on the scalp, where n represents the number of channels in EEG signals. In particular, each entry A ij represents the physical distance between the sensor i and the sensor j, computed referring to the International 10/20 Positioning System. According to this system, electrodes are placed at a distance of 10% (or 20%) of the Inion-Nasion distance (Myslobodsky and Bar-Ziv, 1989) . Table 1 reports a summary of the datasets used in our experiments. The proposal was validated by analyzing its impact on existing architectures in literature (Hechtlinger et al., 2017; Zhong et al., 2020; Zhang et al., 2019) . For each architecture, we evaluated the models as the number of convolutional filters changes, as it is shown in the relative configuration (Tables 2, 4, 6) . For the experiments on the MNIST and 20NEWS datasets, model performance estimation was performed using an holdout method since both of the datasets report a predefined train/test split. Moreover, model performances were evaluated based on the performances averaged over 10 repetitions, where, for each repetition, models' parameters were reinitialized following the same inizialization criterion. For the experiments on the SEED dataset, model performance estimation was estimated focusing on the Subject-Independent Classification: still following the experimental protocol of (Zhong et al., 2020) , we adopted the leave-one-subject-out cross-validation, in which 14 subjects are considered as training set, and the remaining subject as test set. This was repeated for each possible configuration. The performance is evaluated averaging the test accuracies using one session of data. Moreover, for each experiment, during the training stage, 30% of the training set was extracted following a stratified sampling (Parsons, 2014) . Each experiment was run considering early stopping as convergence criterion. Significance differences about the comparisons between the models were tested using hypothesis testing. For each resultexpressed by the average and the standard deviation -a normality test was firstly made using the Shapiro-Wilk test (Shaphiro and Wilk, 1965) . Then, according to the results of the normality tests, hypothesis tests were made using the Student's t-test (Student, 1908) , in the case of normally distributed data, or Mann-Whitney U-test (Mann and Whitney, 1947) , otherwise. For each test, a significance level of α = 0.05 was considered. Hypothesis tests were formulated as follows: Full-connected neural networks as filter-generating network (FGN) architectures were adopoted, whose number of nodes and layers was tuned using a bayesian optimization (Snoek et al., 2012) in each experiment. In each experiment, the adopted models were two layers architectures L 1 −L 2 , L i ∈ {Ct, DCt, F Cn, P ooling}, where Ct, DCt referred to a t filters static ConvGNN layer and a t filters DGCF layer respectively, F Cn referred to a fully connected layer having n hidden units, and P ooling referred to a global pooling layer (Xu et al., 2018) . Using the MNIST dataset, the configuration C20 − C20 proposed by (Hechtlinger et al., 2017) is used. In the experimental setup of this work, a comparison was made substituting the first convolutional layer with a dynamic one. In our analysis, we varied the number of convolutional filters of the first layer in order to evaluate the effectiveness in using a dynamical approach, in both the static and dynamic models. A summary of the adopted configurations is shown in Table 2 . As it is shown in Figure 2 (a), the introduction of the dynamical layer increases the average performance of the architecture. Moreover, its interesting to notice that our dynamic approach leads to good performances also with simpler architectures: results comparable with the ones reported in (Hechtlinger et al., 2017) are achieved using a fewer number of convolutional filters. It is also important to point out that using our approach, convergence during the training stage was reached in a fewer number of epochs, as it is shown in Figure 2 (b). In Table 3 the results related to the hypothesis tests are reported: the null hypothesis was rejected for each configuration (p < 0.05), confirming the significance of the improvement given by our approach. 0.0322 C20 − C20 0.0420 On these data, we referred to the architectures presented in , made by C16 − F C100. Also in this case, the comparisons were made varying the number of convolutional filters, and considering the convolutional layer both as static and dynamic. A summary of the adopted configurations is shown in Table 4 . As it is shown in Figure 3 (a), the introduction of the dynamical layer increases the average performances. In Figure 3 (b) we can observe again how our method has a faster convergence than the static one. In Table 5 the results related to the hypothesis tests are reported: in each case, the null hypothesis was rejected confirming the significance of the improvement given by our method. p-value C2 − C20 0.0015 C4 − C20 0.0010 C8 − C20 0.0010 C16 − C20 < 0.0001 The base architecture considered for this experiment was similar to the RGNN model proposed by (Zhong et al., 2020) , i.e. Ct−P ooling. Differently from (Zhong et al., 2020) , in this series of experiment domain adaptation (Wang and Deng, 2018) techniques were not adopted. Moreover, we chose the global mean pooling (Xu et al., 2018) across all the nodes on the graph instead of sum pooling, since it gave better results in firsts exploratory experiments. Also in this case, comparisons were made considering the convolutional layer both as static and dynamic, and varying the number of convolutional filters. Finally, the weight decay parameter was introduced to decrease the models' complexity. A summary of the adopted configurations is shown in Table 6 . As it is shown in Figure 4 (a), the introduction of the dynamical layer strongly increases the average performance of the architecture. In Figure 4 (b) it is enhanced the quicker convergence of our method than the static one. In Table 7 the results related to the hypothesis tests are reported: except for the case C2, the null hypothesis was rejected for each configuration, confirming that the improvement given by our method is significant. It's interesting to notice what is shown in Table 8 : referring to what is presented in (Zhong et al., 2020) , our method overcomes some domain adaptation techniques, such as TCA. Configuration p-value C2 0.3118 C4 0.0138 C8 < 0.0001 C16 0.0088 C32 0.0007 Accuracy (mean ± std) SVM 56.73 ± 16.29 TCA* (Pan et al., 2010) 63.64 ± 14.88 SA* (Fernando et al., 2013) 69.00 ± 10.89 T-SVM (Collobert et al., 2006) 72.53 ± 14.00 DGCNN (Song et al., 2018) 79.95 ± 09.02 DAN* (Li et al., 2018a) 83.81 ± 08.56 BiDANN-S* (Li et al., 2018c) 84.14 ± 06.87 BiHDM* (Li et al., 2020) (SOTA) 85.40 ± 07.53 RGNN* (Zhong et al., 2020) 85.30 ± 06.72 DGCF (ours) 81.76 ± 05.38 Table 8 : Subject-independent best average classification accuracy (mean ± std) on SEED dataset using different methods, as reported in (Zhong et al., 2020) . In the last row, the best average accuracy of the proposed model was reported. Methods highlighted with * involve the use of Domain Adaptation techniques. In this section we propose a visual analysis of the training stages of both the dynamic and static approach on the SEED dataset experiment. The main aim of this analysis consists in inspecting differences in filter generation according to the input. In particular, assuming that common patterns are shared among equally-labeled samples, we expect that filters generated for samples of the same class are similar to each other, while they are different for samples belonging to different classes. We chose the SEED dataset for this analysis since it has the lowest number of classes among the datasets used for the experimental assessment. This analysis was made considering a random subject as test set, and the remaining as training set. Configurations C2 − AvgP ool, for the static model, and DC2 − AvgP ool, for the dynamic model, were considered. The remaining details about the experimental setup follow the Table 6 . The learning processes are graphically described by weights distribution after each training epoch. In Figure 5 , the training of both the static filters are shown. In Figure 6 , the training stage of both of the dynamic filters, produced by the filter-generating network, are shown for each label (negative, neutral and positive, from left to right). Since the filters are generated uniquely for each sample, in order to have a fair evaluation of how they are distributed for each epoch, filters related to correctly classified samples in each epoch are collected and averaged. From Figure 6 , we can observe how the filter generation process is different for each class: assuming that equally-labeled data should share common patterns, Figure 6 shows how different ranges of values are covered on the final parameters' configuration for input belonging to different classes, involving that specific filters are obtained according to its features. The direct consequence of this aspect is that feature extraction of the main architecture is enhanced by intrinsic patterns hidden in the sample itself. In facts, we can see that, almost in all cases, satisfying results are achieved using a low number of filters. Moreover, comparing Figures 5 and 6 , we can notice a difference in the ranges of values covered by the last configurations of both the models: static filters have weights included in the range [−1, 1.5], while the dynamic ones have weights overall included in the range [−0.6, 0.5]. This aspect involves that the use of the dynamic layer could lead to less complex models, thus avoiding over-fitting/under-fitting (Bishop, 2014) . Figure 7 : Heatmap weights representations of the first (a) and the second (b) filter of a static convolutional layer at the end of the learning process on the SEED dataset. Filters are represented by matrices W ∈ R 9×5 , where 9 is the kernel size and 5 is the number of input features, corresponding to the five EEG bands (see text for further details). A graphical representation of the filters related to the final configuration of each model was made using heatmaps in Figure 7 , for the static model, and Figure 8 , for the dynamic one. Considered a generic filter W ∈ R K×J , the entry W ij represents the numerical value related to the transform of the j-th feature of the i-th neighbour. For the dynamic case, standard deviations of the weights are represented into the heatmap cells since, as we described above, for this analysis filters were averaged over a selected group of samples. In Figure 8 we can clearly confirm what we observed in Figure 6 : the filter-generating network generates different filters according to the sample label. It is interesting to notice how different features are enhanced in the label related filters: for example, for the positive label, both of the first and the second filter enhance, with high values in absolute value, features 3 and 4 (corresponding to beta and gamma bands), for each neighbor; differently, instead, filters related to the negative label enhances feature 1, corresponding to the theta band. In this work, we have proposed a dynamic method to perform spatial convolution on graph-structured data. Combining the idea of having dy- Figure 8 : Heatmap weights representations of the first (a) and the second (b) filter of a dynamic convolutional layer at the end of the learning process on the SEED dataset, averaged over the correctly classified samples. For each filter, averages related to the negative (left), neutral (center) and positive (right) labels are shown. Filters are represented by matrices W ∈ R 9×5 , where 9 is the kernel size and 5 is the number of input features, corresponding to the five EEG bands (see text for further details). namically changeable behaviours in ANNs and convolutional graph neural networks, this work aimed to present a graph convolutional layer capable of performing convolution using node-specific filters in order to achieve an input-specific filtering operation. We have proposed a dynamic method to perform spatial convolution on graph-structured data in this work. Combining the idea of having dynamically changeable behaviours in ANNs and convolutional graph neural networks, this work aimed to present a graph convolutional layer capable of performing convolution using node-specific filters to achieve an input-specific filtering operation. We altered the behaviour of a convolutional layer in a dynamic fashion using a filter-generating network. In this way, the proposed graph convolutional layer learns and applies input-specific filters, customising the filtering operation according to its input graph. We run a series of experiments to assess the improvements in using a dynamic approach to generate convolutional filters. It empirically emerged that our proposed strategy leads to better performances than those achieved using the static convolution on graphs. As we observed from Figures 6 and 8 , the filter-generating networks learn to produce class-specific filters, making the convolution operation input-specific actually. Furthermore, it also emerged that convergence is reached in fewer epochs, reducing training time in a significant matter. Finally, using regularisation techniques, the use of an external module leads to filters having smaller weights than the static filters, which leads to an overall lower complexity of the main architecture. These aspects were evident in the emotion recognition from EEG signals, which is a complex task to achieve. In future work, we would like to introduce and analyse the use of a dynamic local filtering layer having local filters generated for each neighbourhood, as proposed by (Jia et al., 2016) in the image domain. Currently, since the filter-generating network takes as input the entire graph, our strategy is constrained to data having a fixed graph topology. Using a local dynamic convolution, we could overcome this limit by extending this layer's functionality to data with non-fixed graphs topologies. Diffusion-convolutional neural networks Neural machine translation by jointly learning to align and translate Hypernetwork knowledge graph embeddings Bridging the gap between spectral and spatial domains in graph neural networks Mixture density networks. WorkingPaper. Aston University Bishop-pattern recognition and machine learning-springer Neural networks for pattern recognition Molecular generative graph neural networks for drug discovery Distance in graphs Large scale transductive svms Geometric attentional dynamic graph convolutional neural networks for point cloud analysis Convolutional neural networks on graphs with fast localized spectral filtering Programming in the brain: a neural network theoretical framework A unified approach to building and controlling spiking attractor networks Unsupervised visual domain adaptation using subspace alignment Hplapgcn: Hypergraph p-laplacian graph convolutional networks Generative adversarial nets A new model for learning in graph domains 5th International Conference on Learning Representations A generalization of convolutional neural networks to graph-structured data Deep convolutional networks on graph-structured data Identification of acoustic emission sources for structural health monitoring applications based on convolutional neural networks and deep transfer learning Dynamic filter networks A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization Attractor dynamics and parallelism in a connectionist sequential machine Convolutional neural networks for sentence classification Imagenet classification with deep convolutional neural networks Deep learning Gradient-based learning applied to document recognition Cayleynets: Graph convolutional neural networks with complex rational spectral filters Cross-subject emotion recognition using deep adaptation networks Adaptive graph convolutional neural networks A novel bi-hemispheric discrepancy model for eeg emotion recognition A bi-hemisphere domain adversarial neural network model for eeg emotion recognition Neighbor-anchoring adversarial graph neural networks Lstm variants meet graph neural networks for road speed prediction Weakly supervised object-aware convolutional neural networks for semantic feature matching On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics Efficient estimation of word representations in vector space Locations of occipital eeg electrodes verified by computed tomography Hyper-graph-network decoders for block codes Learning convolutional neural networks for graphs Learning multiple goal-directed actions through self-organization of a dynamic neural network model: A humanoid robot experiment Towards instructable connectionist systems Motor primitive and sequence self-organization in a hierarchical recurrent neural network Domain adaptation via transfer component analysis Stratified sampling The graph neural network model Learning to control fast-weight memories: An alternative to dynamic recurrent networks An analysis of variance test for normality Neural networks and analog computation: beyond the Turing limit Dynamic edge-conditioned filters in convolutional neural networks on graphs Practical bayesian optimization of machine learning algorithms Eeg emotion recognition using dynamical graph convolutional neural networks Supervised neural networks for the classification of structures The probable error of a mean Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork Graph attention networks. stat 1050 Continual learning with hypernetworks Deep visual domain adaptation: A survey Simplifying graph convolutional networks 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems Accurate and automatic tooth image segmentation model with deep convolutional neural networks and level set method Convolutional neural networks for medical image analysis: state-of-the-art, comparisons, improvement and perspectives ResGNet-C: A graph convolutional neural network for detection of covid-19 Graph hypernetworks for neural architecture search Leveraging graph neural networks for point-of-interest recommendations Learning graph structure via graph convolutional networks Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks EEG-based emotion recognition using regularized graph neural networks