key: cord-0940368-qg1uv3dg authors: Yu, Xiang; Lu, Siyuan; Guo, Lili; Wang, Shui-Hua; Zhang, Yu-Dong title: ResGNet-C: A graph convolutional neural network for detection of COVID-19 date: 2020-12-30 journal: Neurocomputing DOI: 10.1016/j.neucom.2020.07.144 sha: c5f7ce181345946c0b031253055fd1a9537853fa doc_id: 940368 cord_uid: qg1uv3dg The widely spreading COVID-19 has caused thousands of hundreds of mortalities over the world in the past few months. Early diagnosis of the virus is of great significance for both of infected patients and doctors providing treatments. Chest Computerized tomography (CT) screening is one of the most straightforward techniques to detect pneumonia which was caused by the virus and thus to make the diagnosis. To facilitate the process of diagnosing COVID-19, we therefore developed a graph convolutional neural network ResGNet-C under ResGNet framework to automatically classify lung CT images into normal and confirmed pneumonia caused by COVID-19. In ResGNet-C, two by-products named NNet-C, ResNet101-C that showed high performance on detection of COVID-19 are simultaneously generated as well. Our best model ResGNet-C achieved an averaged accuracy at 0.9662 with an averaged sensitivity at 0.9733 and an averaged specificity at 0.9591 using five cross-validations on the dataset, which is comprised of 296 CT images. To our best knowledge, this is the first attempt at integrating graph knowledge into the COVID-19 classification task. Graphs are constructed according to the Euclidean distance between features extracted by our proposed ResNet101-C and then are encoded with the features to give the prediction results of CT images. Besides the high-performance system, which surpassed all state-of-the-art methods, our proposed graph construction method is simple, transferrable yet quite helpful for improving the performance of classifiers, as can be justified by the experimental results. The outbreak of a new coronavirus, which was later named as COVID-19, has surprised the world with its high infectivity and fatality rate [1] . Currently, more than seven million people around the world have been diagnosed with the virus while more than 406 thousand of them have died because of the virus, until 9/ June/2020. Due to the influence of the virus, millions of cities have locked down which severly caused the economy damage all over the world to slow the spreading of the virus as there is no vaccine available yet. Patients being infected may show symptoms like having fever, coughing in a few days and could develop into severe illness or even die quickly. An early diagnosis turns out to be the most effective way for the necessary treatment and stopping the spreading of the virus. Viral nucleic acid test and CT screening are the two most widely used techniques for clinical diagnosis. However, the viral nucleic acid test requires sophisticated devices and takes a long time to give the diagnostic result [2] . Additionally, the high false-negative rate indirectly facilitates the spread of the viral COVID-19. Compared to the viral nucleic acid test, chest CT images are reported of high sensitivity [3] . However, manual interpretation of CT scanned images is time-consuming and is unstable due to the personal experience of radiologists and artefacts such as fatigue. Therefore, developing an automated computer-aided diagnostic system with high accuracy is demanding. https://doi.org/10.1016/j.neucom.2020.07.144 0925-2312/Ó 2020 Elsevier B.V. All rights reserved. Recent years, we have witnessed significant advancement of deep learning, which was highly featured by Convolutional neural network (CNN). In 2012, a new structured CNN named AlexNet showed incomparable superiority to other traditional methods on natural image classification task [4] . Later on, the performance of CNNs has been greatly improved thanks to the upgraded hardware and novel architectures. Compared to the Central Processing Unit (CPU), Graphics Processing Unit (GPU) shows more powerful parallel computing power in graphics processing. New generations of GPUs, which benefit from the advancement of Integrated Circuits (IC) and electron components, have been equipped with even stronger computation capability. Besides the improvement of hardware, structural innovations of CNNs considerably boosted the performance of the state-of-the-art networks. By increasing the depth and width of networks, GoogLeNet [5] , a 22-layer network, more efficiently utilizes the computing resources in the network compared to the previously developed networks. However, one problem brought in by the increase of networks' depth and width is gradient vanishing in the training process. To mitigate the training difficulty, residual networks [6] , which incorporated residual learning, achieved even better performance on ImageNet while the complexity remains lower than VGG [7] . Compared to traditional convolution, depthwise separable convolution produces fewer parameters when the same sizes of the generated output are given [8] . Novel convolution technique named dilated convolution was proposed to enable a larger receptive field of convolutional kernels without introducing extra parameters. Alongside the improved performance of CNNs, CNNs have also been widely used in engineering, business, medicine [9] [10] [11] . CNNs have demonstrated powerful capability on extracting and combining spatial features to form high-level features, which turn out to be more representative compared to manual-crafted features. Recently numerous reports also demonstrated the success of AI and machine learning [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] , and other applications on brain functional network [23] [24] [25] [26] [27] , indoor detection [28, 29] , fingerspelling [30] [31] [32] , and image analysis [33] . However, the applications of CNNs are greatly hindered because the underlying relationships between each element are ignored. Graph convolutional neural network (GCN), which can better deal with graph-structured data, therefore, is receiving more attention from different areas. Each element in GCNs is taken as a node while the relationships between elements are denoted by edges. Aggregation of neighbourhood features is one of the keys that differentiates GCNs from CNNs. Given the advantages of GCNs, they have also been widely used in areas including image classification and semantic segmentation [12] . Considering the benefits of GCNs, we developed a highperformance system for the detection of COVID-19 based on GCNs. In this paper, we first proposed a novel ResGNet framework, and then we proposed three different models for the detection of COVID-19 when implemening models under the framework: (i) We first proposed ResNet101-C model, which takes ResNet101 as the backbone. The models ending with ''-C" here and after stands for the classifiers that detect COVID-19. The purposes of the proposition of ResNet101-C are two folds. (a) ResNet101-C alone can be used to classify lung CT images into normal and pneumonia and therefore detect COVID-19. (b) More importantly, after training with the interested training set, ResNet101-C is enabled to extract more representative features for the rest two models NNet-C and ResGNet-C. We first add more layers to the top of ResNet101, which is close to the output, to obtain ResNet101-C and have it trained with the training set. High-level features of 128 dimensions are then extracted from the activation layer in the ResNet101-C for further classification though ResNet101-C alone can work as a classifier for the binary classification task here. (ii) NNet-C, a one-layer neural network, is a simple classifier that takes features extracted by ResNet101-C as input. Also, the proposition of NNet-C mainly comes from two parts. One is that NNet-C alone is a high-performance classifier on detection of pneumonia caused by COVID-19. Another main reason is that NNet-C, which shares the same architecture as the graph neural network (GNN) in ResGNet-C, can be used as the model for comparison with models built under ResGNet-C framework. (iii) ResGNet-C. In ResGNet-C, ResNet101-C is reutilized for feature extraction. Graphs of the extracted features are constructed accordingly and are then combined with the extracted features as the input to a 1-layer GNN in ResGNet-C for classification. To build graphs, images both in the train set and test set are divided into batches to obtain features in batches. In each batch of the train set, each feature is taken as one node of the graph while the edges between nodes are built according to the top k neighbours with the highest similarities. Inspired by [34] , we hypothesis that there are edges between one node and other top k nearest nodes. The distance between nodes is measure by Euclidean distance while edges are quantified by the adjacency matrix. The 1-layer GNN, though shallow in terms of depth, however, showed powerful performance after trained with graph-combined features. Experiments on our COVID-19 dataset demonstrated that our proposed ResGNet-C achieved the highest accuracy on the dataset by using the 5-cross-validations, which surpasses all of the stateof-the-art models. The contributions of this paper are mainly three-folds. One is that this is the first attempt to date to combine graph representation with CNN on COVID-19 detection. Another is that we proposed a ResGNet framework that easily integrates graph representation of features, which is also extendable to other diseases detection area. The last and most important one is that we developed three different models for the detection of COVID-19 while ResGNet-C performed best. This paper is organized as follows. In Section 2, we will briefly introduce the basic of GCN. Then we will move to implementations of ResGNet framework in Section 3. Experiment design and results will be presented in Sections 4 and 5, respectively, while we conclude this paper in Section 6. Our world has been flooded with countless data from areas of chemistry, biology and computer science. To represent the underlying relationships of data, graphs are widely used. GNNs are therefore developed to the process the graphs constructed based on the data. Given the graph G and one of its nodes n, GNNs can be represented as learnt functions that map G and n to vectors of reals: where m is the dimension of Euclidean space [35] . GNNs can be extended to GCNs by introducing convolutions. There are considerable works that successfully built graphs for application of GCNs in various areas [12, 36] . A Text GCN was built in [37] that achieved better performance than the state-of-the-art techniques on text classification. The single text graph was created according to word-occurrence in the particular corpus. The recommendation system is another popular application scenario of GCN. In [38] , authors encoded graph convolution with RNNs to mine possible pattern of users and items. There are also successful works of GCNs on medical image analysis. To overcome the shortcomings of patched-based in colorectal cancer grading, Yanning et el. proposed so-called CGC-Net that takes the nucleus in a histology image as nodes while cellular interactions are mapped as edges [39] . Another GCN for cervical cell classification was developed in [34] . k-means clustering is applied to features extracted by CNNs to acquire centroids, based on which each node is depicted by the nearest centroid. The element in the adjacency matrix is set to one when two nodes are found to be neighbours to each other according to the Euclidean distance. As the global situation is worsening due to COVID-19, experts are devoted to solving the global issue collaboratively. To facilitate the detection of COVID-19, researchers from both clinical area [40] [41] [42] and computer science area [43] [44] [45] have contributed their valuable efforts. Bilateral pulmonary parenchyma ground-glass and consolidative pulmonary opacities are typical CT findings, which, therefore, could be determinants on making the diagnosis of COVID-19. It was found by Wang et al. [46] that multilobe lesions appeared in most patients while those lesions were usually shown in the peripheral zone and the central zone. Patchy shadows, ground-glass opacity and consolidation change are demonstrated among patients. Similar conclusions were found by Zhao et al. [47] . As pointed out by them, chest CT images of patients being infected by COVID-19 are highly featured by lungs that were in the presence of bilateral ground-glass lesions. The progressive lesions are, however, consolidations with no migratory lesions. In the computer science community, there are also considerable works that aim at giving reliable diagnostic results based on images. A deep learning model named (COVID-Net) feed with Xrays images was proposed to detect COVID-19 cases. The results on open source repositories showed promising results while the developed AI system is publically available. In [43] , the authors developed a detection system by using inception blocks [5] . However, the accuracy, which was 0.7310, is still quite low compared to other existing models. In another work [44] , the authors developed a 3D deep learning framework by deploying ResNet50 as the backbone. The sensitivity and specificity were also more than 0.90, which gave an overall accuracy of 0.96. Clinically, sensitivity is more important than specificity to some extent. In work [45] , the author developed an automatic detection system that achieved 100% sensitivity. Though experts are consolidated on providing diagnostic results in faster and more accurate ways. However, there are still many challenges ahead. For radiologists, artificial factors such as fatigue and personal experience would heavily harm the accuracy of the diagnostic results. Also, it is challenging for radiologists to provide accurate results in a limited time. For the automatic detection systems, high sensitivity for most of the systems is still one main obstacle that prevents them from gaining better accuracy given the complexity of images and other possible factors. Also, all of the methods mentioned above ignored to consider the underlying relation between each chest images. Considering this, we proposed our ResGNet framework to pave the way for future research on the application of GCN to the detection of COVID-19. Besides, two other models named ResNet101-C and NNet-C that showed certain capability on detection of COVID-19 are developed as well. ResGNet frameworks, which is flexible for transplanting to different image classification tasks, consists of pre-trained CNNs for feature extraction and GNN for classification. The pre-trained CNNs could be trained on different image data sets and then be directly transferred to the classification task. However, many works showed that the performance of pre-trained CNNs would be significantly improved if they can be fine-tuned by datasets that are highly similar to the interested datasets [48, 49] . Considering this, the CNNs that are pre-trained on natural images are firstly trained on the concerned datasets. Therefore, the top layers, which are close to the output, of the pre-trained layers have to be modified. After training with the interested datasets, the pre-trained CNNs is then implemented in ResGNet framework. Though the pretrained networks can be directly used for classification, the performance can be further improved by introducing GNN. After feeding pre-trained networks with images, features can then be extracted from fully-connected layers in the network. We then take each feature as one node in graph G to build graph representation for GNN,. Graph G is paired with (V, E), where V denotes nodes in the graph while E is the set of edges [35] . For a node classification task, v i 2 V i 2 R N À Á for each node in the graph while N is the number of nodes in the graph G. Therefore, 2 E stands for the edge between node v i and v j . As E is abstract, the adjacency matrix A2 R NÂN is used to depict the edges betweeen nodes. When there is an edge between nodes i and j, the corresponding location in the adjacency matrix Aði; jÞ is set to be one. We assume that there is an edge when the node falls into the top k nearest neighbours of another node according to Euclidean distance as shown in Fig. 1 . Compared to other graph generation model [50] , our method of constructing structured graphs is simple and straightforward. Instead of iteratively updating nodes in the graph, we predefined the number of nodes according to batch size N while edges are updated concerning the adjacency matrix. Let the extracted features of our chest CT images be X 2 R NÂM , X i and X j 2 R 1ÂM corresponds to node i and node j in X, then distance Dis ij between node i and node j is calculated according to: By introducing a knn search method, which looks for top k nearest neighbours with minimum Euclidean distance, we can find the closest top k nearest neighbours. Given node i and a vector of reals we can find and record k neighbours by simply sorting Dis i . We used knn(X i ) to refer to the k nearest neighbours of node X i . Therefore, the adjacency matrix is built as follows: Correspondingly, a degree matrix D that has the same dimensions as A can be calculated according to the following equation: where D ii is the trace of degree matrix D. Given the features X 2 R NÂM in a GCN with multi-layers, node features are updated according to the following rule: F hþ1 2 R NÂM hþ1 and F h 2 R NÂM h stands for the nodes' feature representation in (h + 1)th layer and hth layer, respectively. is the weights between hth layer and (h + 1) layer. d(Á) is the softmax activation function, which can be defined as: A $ 2 R NÂN indicates the normalized adjacent matrix A. The process of normalizing A can be demonstrated as [51] : As GNNs are usually no more than two layers [34, 51] , we build a one layer GNN in this work. Bias term b is introduced to improve the performance of fitting. Therefore, we have: Specifically, which is the features extracted by pre-trained CNNs. The combination of features extracted by pre-trained networks and graph is the key of GNN in the proposed framework. However, we found it effective by simply multiplying A with the extracted features, which means: F combined is then used as the input of GNN to optimize W 0 and b 0 . The error between F 1 and the expected output Y can be written as: where Y j is the ture category of feature j and F 1 j is the predicted category of feature j. W 0 is then updated according to: where b is the momentum rate. By training GNN with the combined input F combined , we finally have a model under ResGNet framework. The pseudocode code of ResGNet model is given in Algorithm 1. Step 1: Load pre-trained networks trained on ImageNet; Step 2: Replace top layers with new top layers for the classification task; Step 3: Train modified pre-trained network on the target dataset with predefined hyperparameters; Step 4: Generate features through activation layers in the fine-tuned network; Step 5: Divide the features into batches; Step 6: Find top k nearest neighbours for each feature in the batches and build graphs; Step 7: Combine features with graph representation by multiplying features with the normalized adjacency matrix A; Step 8: Train GNN in ResGNet with the combined features and update parameters. We, therefore, implemented ResGNet-C under our proposed ResGNet framework. Simultaneously, we then have ResNet101-C for preliminary classification and feature extraction and NNet-C for further classification and comparison. Flow chart of constructing three models is shown in Fig. 2 . To obtain the pre-trained network under ResGNet framework, we choose to use ResNet101 to build ResNet101-C. The reason why we use ResNet101 as the backbone here is that ResNet101 showed powerful performance on the image classification task [6] . Same as other networks, ResNet101 includes convolution, batch normalization and pooling units. Convolution is implemented by multiplying images with sliding windows, namely convolution kernels. Usually, the group of convolution kernels increases and the size of output features decreases while the network goes into deeper. Convolutions are usually stacked to form convolution blocks where features within the block remain the same size as the input while the size shrinks due to a larger stride of convolution and pooling. Given the size of the input to convolution is where C GI is the number of channels of the input, the size of convolutional kernels is defined as where C GO is the number of channels of the output. If we have the stride of convolution C S and assume the size output is then the numerical relationship between the size of output and input and kernels are: Similarly, the size of features shrinks as well after max-pooling when the stride is greater than 1. By doing so, the more abstract features with much fewer dimensions can be extracted. However, what makes ResNet different from other networks was the proposed residual learning. Given identity Z and its mapping F(Z), it was assumed that the underlying mapping of features H(Z) after a series of stacked layers was related to F(Z) and Z by: H(Z), therefore, can be represented by: It was believed that H(Z) can then be optimized by optimizing mapping F(Z). Adding of F(Z) and Z is implemented by shortcut connections that skip several convolutional layers and have identity Z added to F(Z) directly. To obtain ResNet101-C for the classification task, we inserted a dropout layer and two fully connected layers with 128-neurons and 2-neurons as output size between FC1000 layer and the Softmax layer in original ResNet101. Given that the original connections in ResNet101 are FC1000-softmax-classification, then the adapted connections are FC1000-Dropout-FC128-FC2-softmax-clas sification. Dropout layer is introduced to prevent the overfitting [52] . Also, FC128 is a transitional layer that prevents significant information loss, which could otherwise happen when inputting the features directly into the final FC2 layer. The pseudocode code of implementing ResNet101-C is given in Algorithm 2. By doing so, we can have the architecture of ResNet101-C in following Fig. 3 . Step 1: Load ResNet101 trained on ImageNet; Step 2: Remove the top layers above FC1000 layer; Step 3: Add a dropout layer as new top layers; Step 4: Add FC128 and FC2 as new top layers; Step 5: Add the softmax layer and classification layer for output, which gives ResNet101-C; Step 6: Training ResNet101-C with the training set of COVID-19. for i = 1: number_of_epochs Step A: shuffle images in the train set Step B: divide train set into batches Step C: train ResNet101-C with each batch end To provide a fair comparison with our GNN in ResGNet, we proposed a simple neural network NNet-C that has the same structure as our GNN while the main difference is the input. Given a batch of images X 2 R HÂWÂ3ÂN , H, W, N stands for height, width, batch size, respectively. Then the output features X 0 of ResNet101-C can be obtained by: where the features X 0 2 R NÂ128 . ResðÁÞ is the operation of feature extraction by applying ResNet101-C. For NNet-C, the input is the features extracted by ResNet101-C in the layer FC128. Our GNN in ResGNet shares the same architecture of NNet-C while NNet-C is also an independent classifier. The architecture of NNet-C is shown in Fig. 4 . For COVID-19 detection, we term our model as ResGNet-C that was implemented under the proposed framework ResGNet. ResNet101-C is used to provide GNN in the proposed ResGNet-C with high-level features. ''ResGNet101-C-FC128" means only layers including layer FC128 and before in ResGNet101-C in Fig. 5 are reutilized in ResGNet-C. In Fig. 5 , the asterisk in the circle means the production of the adjacency matrix of graphs and features. The proposed GNN in ResGNet has the same structure as NNet-C while the main difference is GNN takes the combined features as input while the input of NNet-C is only the high-level features. When training our proposed ResGNet-C, images have to be divided into batches while the size of batches N can be customized. Empirically, we set the batch size to be 2 P , N = 2 P . The details about choosing P will be presented in the experimental section. However, the number of images in the data set doesn't necessarily to be divisible by the batch size N. Images in the second last batch will be reused in the last batch to make the size of the last batch to be N. The detailed process of acquiring each batch in the data set is shown in Fig. 6 . The top red bar denotes the sequence of images in the data set, which could either be the training set or test set. Given the number of images in the data set is O, the batch size is N, the number of batches where dÁe is the ceiling operation. Each batch of images is first forwarded to ResNet101-C for feature acquisition. Then the graph within features is built by the aforementioned knn algorithm. By multiplying the normalized adjacency matrix with the features, the graph-combined features can then be forward to the classification layer to calculate the errors. The errors are then back-propagated to update the weights in GNN. The data flow of training data in the training period of three proposed models is shown in Fig. 7 . The pseudocode of training ResGNet-C is given in Algorithm 3. Step 1: Generate training features through ResNet101-C; Step 2: for k = 1: number_of_epochs for n = 1: S Tr (S Tr is the number of batches in the training set, which is calculated through equation (23) Step A: Construct relation graph in each batch of features; Step B: Combine features with graph; Step C: forward combined features to GNN; Step D: backwards the errors to update weights in GNN; end end Step 3: Save and store the weights of ResGNet-C. In the inference phase, we followed a similar pattern to build and embed graphs. When inferring new images with the trained ResGNet-C, however, the size of batches is suggested to remain the same as the setting for the training set to maintain the best performance of the trained model. If there are no sufficient images for test-ing, samples from the training set can be utilized to form at least one batch of images. Details of inference are given in Algorithm 4. Step 1: Load trained ResGNet-C; Step 2: Generate testing features through ResNet101-C; Step 3: Divide images in the test set into S Te batches according to equation (23). Step 4: Get the predicted categories for images in the test set. for n = 1: S Te Step A: Construct graph for each batch of features; Step B: Combine features with the graph; Step C: forward combined features to GNN; Step D: Get the predicted categories of each batch; end Step 5: Calculate the accuracy by comparing the predicted categories to real categories. When measuring the performance of our proposed models, five indicators including specificity, sensitivity, accuracy, precision, and Specificity reflects how our models perform on recognizing normal images from the test set and can be calculated through: Sensitivity is the percentage of true positive (COVID-19 confirmed pneumonia) images that are correctly recognized and can be defined as: Neurocomputing xxx (xxxx) xxx Accuracy demonstrates the capability of our model on correct classification and can be written as: Precision describes the percentage of predicted patients are true patients by F1 score measures the classification ability: In ResGNet framework, N determines the sizes of graphs to be built while k determines the number of neighbours that should be consdered when calculating the adjacency matrix A in the graph. Therefore, the choice of k and N should be carefully chosen while training ResGNet-C. To choose the best k and N for our ResGNet-C, we vary both k and N until best k and N found. We use overall accuracy to measure the performance of GNN when different k and N are introduced. where C(k, N) means the overall accuracy of GNN on test set when k and N are used as the number of neighbours and batch size respectively. k* and N* are the ideal values of k and N. The reason why we use the accuracy on test set is that we aimed at designing models that performed best on unseen data. The search domain of two variables are set as where k LB and k UB stands for the lower bound and upper bound of k. N LB and N UB stands for the lower bound and upper bound of N. The relationship between k and N is that k < N. In total, seven variables, shown in Table 1 , are defined in our algorithm. Details of searching for k* and N* can be seen in Algorithm 5. (30) and (31) . Procedures: Step 0: Variable Allocation See Table 1 . Step 1: Initialization Initialize k*, k with k LB , k* = k = k LB Initialize N*, N with N LB . N* = N = N LB Initialize variable Current_Accuracy = 0 Set the increment stride of k to be k s and the stride of N to be N s . Step 2: Retrain Train GNN with k, N and obtain C(k, N) on test set; Step 3: Update Current_Accuracy & k* & N* if C(k, N) > Current Accuracy % find a better solution Current Accuracy ¼ Cðk; NÞ; k* = k; N* = N; end Step 4: Update k if (k + k s < k UB ) && (k + k s < N) k = k + k s ; Then go back to Step 2; else Go to Step 5; end Step 5: Update N if N + N s