key: cord-0987020-3fqy8h4f authors: Fan, Yuqi; Liu, Jiahao; Yao, Ruixuan; Yuan, Xiaohui title: COVID-19 Detection from X-ray Images using Multi-Kernel-Size Spatial-Channel Attention Network date: 2021-06-04 journal: Pattern Recognit DOI: 10.1016/j.patcog.2021.108055 sha: 12a8aa9c7efa708bfedf51eaf37a56942ddf87da doc_id: 987020 cord_uid: 3fqy8h4f Novel coronavirus 2019 (COVID-19) has spread rapidly around the world and is threatening the health and lives of people worldwide. Early detection of COVID-19 positive patients and timely isolation of the patients are essential to prevent its spread. Chest X-ray images of COVID-19 patients often show the characteristics of multifocality, bilateral hairy glass turbidity, patchy network turbidity, etc. It is crucial to design a method to automatically identify COVID-19 from chest X-ray images to help diagnosis and prognosis. Existing studies for the classification of COVID-19 rarely consider the role of attention mechanisms on the classification of chest X-ray images and fail to capture the cross-channel and cross-spatial interrelationships in multiple scopes. This paper proposes a multi-kernel-size spatial-channel attention method to detect COVID-19 from chest X-ray images. Our proposed method consists of three stages. The first stage is feature extraction. The second stage contains two parallel multi-kernel-size attention modules: multi-kernel-size spatial attention and multi-kernel-size channel attention. The two modules capture the cross-channel and cross-spatial interrelationships in multiple scopes using multiple 1D and 2D convolutional kernels of different sizes to obtain channel and spatial attention feature maps. The third stage is the classification module. We integrate the chest X-ray images from three public datasets: COVID-19 Chest X-ray Dataset Initiative, ActualMed COVID-19 Chest X-ray Dataset Initiative, and COVID-19 radiography database for evaluation. Experimental results demonstrate that the proposed method improves the performance of COVID-19 detection and achieves an accuracy of 98.2%. Novel coronavirus 2019 (COVID-19) emerged and spread rapidly worldwide in late 2019 [1] . It poses a great 2 threat and challenge to human beings around the world. To date, more than 50 million people have cumulatively been 3 infected with COVID-19, with a death toll of upwards of 1.3 million in the world. Early detection and isolation of 4 the infected patients are an effective way to prevent the spread of the COVID-19. Conventional methods for COVID-5 19 detection are unable to meet the needs of the rapidly increasing number of infected patients, and the number of 6 medical specialists became insufficient to meet the demand of healthcare professionals [2] . There is an urgent need 7 for an effective, automatic COVID-19 screening method. Images such as X-ray have been used for the diagnosis of 8 the COVID-19 and an example is shown in Fig. 1 . Fig. 1(a) shows an X-ray image with hazy opacity caused by 9 COVID-19 infection. Fig. 1(b) shows an X-ray image of a healthy subject, in which the chest area is clear. 15 Apostolopoulos et al. [14] applied deep neural network models, e.g.VGG-Net, to classify COVID-19. Ozturk et al. 16 [15] proposed a deep method DarkCovidNet for detecting COVID-19. The model uses fewer convolutional layers and 17 kernels. However, the performance of these methods is far from satisfactory. The features extracted could be out of 18 the lung region that also appears fuzzy, e.g., certain soft tissues. It is, hence, necessary to develop a method to extract 19 features from regions of great potential for classification. 20 Attention mechanism has been developed for classification and segmentation tasks of images to improve the attention are executed sequentially, it is likely that important features presented by X-ray images are degraded that 27 leads to inferior performance. Moreover, the kernel with fixed size fails to capture the features of different scales and 28 properties. 29 In this paper, we propose a deep learning method for processing X-ray chest images to assist COVID-19 detection. Our method extracts features by introducing attention and variable kernel size. The main contribution of this paper 31 is the multi-kernel-size, spatial-channel attention method (MKSC) to analyze chest X-ray images for COVID-19 32 detection. Our proposed method integrates a feature extraction module, a multi-kernel-size attention module, and a 33 classification module. We use X-ray images from three public datasets for evaluation, which is most comprehensive 34 to our best knowledge. The rest of the paper is organized as follows. Section 2 presents the related work of the deep learning method for 36 medical image classification and attention mechanism. Section 3 describes our proposed method in detail. proposed DeepLung, a fully automated lung computed tomography (CT) cancer diagnosis system consisting of two 58 parts, nodule detection to determine the location of candidate nodules and classification to classify candidate nodules 59 as benign or malignant. Specifically, a 3D fast regional with convolutional neural network (R-CNN) was designed 60 to detect nodules and efficiently learn nodule features using a 3D two-channel module and a U-net-like encoder-61 decoder structure. The system achieved good results on nodule identification in the LIDC-IDRI dataset. Zhu et al. [21] proposed end-to-end training of a deep multi-instance network to classify mammogram X-ray images without where F 1 and F 2 represent the channel and spatial attention feature maps, respectively. Cnct c represents the con-127 catenation of two obtained attention feature maps F 1 and F 2 . F * ∈ R H×W×2C is the final attention feature map. In order to take full advantage of the relationship between the feature planes generated by each convolution kernel, 143 we first squeeze the global spatial information into a channel descriptor. That is, the global averaging pooling is 144 performed for each plane of the high-level feature map, which can be expressed as where y c denotes the channel descriptor obtained by squeezing, H × W is the spatial dimension of the feature map, 146 and F c (i, j) represents each pixel point (i, j) in the feature plane. We compute the relationships across channels using different sets of three coefficient matrices respectively and We implement the above strategy using 1D convolution, such that the process above can be updated with the end-155 to-end training of the neural network. Therefore, the process becomes as follows. Firstly, we squeeze each feature 156 plane into a channel descriptor A 1 ∈ R 1×1×C using global averaging pooling for the high-level feature map F ∈ R H * W * C 157 extracted in the first stage. Secondly, Conv1D operations are performed on the obtained channel descriptor using 158 convolutional kernels of size 1, 3, and 5 to obtain three different channel descriptors where y s denotes the spatial descriptor obtained by squeezing, C is the number of channels in the feature map F, and 182 F s (k) represents the local pixel value of each channel at a specific spatial location. 183 We use three sets of coefficient matrices of different sizes to compute the relationship between cross-spatial loca-184 tions and calculate the spatial attention weights for a given spatial descriptor y s ∈ R H×W×1 . The spatial attention can We implement the process above using 2D convolution, such that the process can be updated with the end-to-end 189 training of the neural network. Firstly, we squeeze the feature information of C pixels at each spatial location of the 190 high-level feature map F ∈ R H * W * C extracted in the first stage into a representative descriptor. We then aggregate all 191 descriptors into a single spatial descriptor A 2 ∈ R H×W×1 . Secondly, we use 2D convolution operations to learn the 192 relationships between pixels as well as between local receptive fields, such that the network is able to better learn dimension. The whole calculation process of spatial attention is as follows: We integrate the chest X-ray images from 3 public datasets: COVID-19 Chest X-ray Dataset Initiative [26], ActualMed COVID-19 Chest X-ray Dataset Initiative [27] , and COVID-19 radiography database [28] . Our dataset 211 consists of 500 COVID-19 X-ray images and 500 non-COVID-19 X-ray images. All images are resized to 224 × 224 212 pixels. We use cross-validation to evaluate the proposed MKSC by first dividing the dataset into ten random folds, 213 with each consisting of 100 randomly selected COVID-19 X-ray images and non-COVID-19 X-ray images. We then 214 use each fold as a validation set and a test set, respectively, as shown in Fig. 5 . We perform 30 experiments on each 215 fold and take the average of the results as the final results. We set the initial learning rate as 10 −4 . When the validation 216 accuracy does not grow three times consecutively, the learning rate drops to half of the previous rate. The number of 217 iterations is 30. Table 1 . Based on the confusion matrix, we calculate the 5 performance metrics commonly used in deep learning classifi-225 cation tasks to evaluate the performance of MKSC: accuracy, precision, recall (sensitivity), specificity, and F1-score. We conduct the experiments and get the confusion matrices shown in Fig. 6 . We can see that our proposed MKSC 228 has high recognition accuracy for both COVID-19 and non-COVID-19 X-ray images. In most cases, TP and TN are 229 above 98%, which shows that the proposed MKSC can correctly classify samples. In the following experiments, we 230 take the average of the results in the confusion matrices in all the folds as the final results. where and what to focus on, and hence obtain higher accuracy. to focus on but also suppress the shadows and skeletal noises of the images that are not helpful for the classification. In addition, the convolution operation in MKSC extracts both cross-channel and cross-spatial features. MKSC cap-247 tures the cross-channel and cross-spatial interrelationships in multiple ranges using multiple 1D and 2D convolutional 248 kernels of different sizes to obtain channel and spatial attention feature maps. That is, MKSC avoids only considering 249 the relationship between either the channels or the spaces. Table 2 also shows that the MKSC method achieves better performance than CBAM. CBAM uses a hybrid atten- size spatial attention module to integrate the cross-channel and cross-spatial relationships in multiple ranges. As presented in Table 2 , ECA-Net achieves the second highest accuracy but has a high standard deviation, which 261 indicates large performance fluctuation. The accuracy of VGG-Net is low, but its degree of performance is the least. Our model achieves the greatest average accuracy (98.17%) with a small standard deviation, which demonstrates 263 superior performance and robustness. In the following, we compute the precision, recall, specificity, and F1-score of each method via confusion matrices. The performance of the methods in terms of the four metrics is shown in Fig. 7 . It can be observed from the figures 266 that the proposed MKSC method improves the performance of precision, recall, specificity, and F1-score over the infections. 279 We add Gaussian noise to the images to evaluate the robustness of the compared methods. The noise distribution 280 follows zero mean and 0.6 standard deviation. Table 3 reports the performance in terms of accuracy, precision, recall, 281 specificity, and F1-score. With added noise, the performance of all models decreases, and the performance of the 282 benchmark models drops significantly. In contrast, the proposed MKSC method achieves the best performance in all remains almost the same, which demonstrates the robustness of our proposed method. In this paper, we proposed a multi-kernel-size spatial-channel attention method for the automatic detection of ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: A novel coronavirus from patients with 308 pneumonia in China Deep learning for image-based cancer detection and diagnosis -A survey Deep learning for the identification of bruised apples by fusing 3D deep features for apple grading systems Automated invasive ductal carcinoma detection based using deep transfer 314 learning with whole-slide images Automatic de-316 tection of invasive ductal carcinoma in whole slide images with convolutional neural networks, in: Medical Imaging Dermatologist-level classification of skin cancer with deep 319 neural networks Deep learning ensembles for melanoma 321 recognition in dermoscopy images Learning efficient, explainable and discriminative representations for pulmonary nodules classification Automated segmentation of exudates, haemor-325 rhages, microaneurysms using single convolutional neural network Automated segmentation of the optic disc from fundus images using an 327 asymmetric deep learning network An automatic method for lung segmentation and 329 reconstruction in chest X-ray using deep neural networks Segmentation of blurry object by learning from examples Atlas-based reconstruction of high performance brain MR data COVID-19: Automatic detection from X-ray images utilizing transfer learning with convolutional 334 neural networks Automated detection of COVID-19 cases using deep neural 336 networks with X-ray images Squeeze-and-excitation networks CBAM: Convolutional block attention module Efficient channel attention for deep convolutional neural networks, in: Proceedings 342 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced 344 training data Feature learning to automatically assess radiographic knee osteoarthritis severity Deep Learners and Deep Learner Descriptors for Medical Applications Deep multi-instance networks with sparse label assignment for whole mammogram classification Deeplung: Deep 3D dual path nets for automated pulmonary nodule detection and classification IEEE Winter Conference on Applications of Computer Vision (WACV) COVIDX-Net: A framework of deep learning classifiers to diagnose COVID-19 in X-ray 352 images Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural 354 network COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest 356 X-ray images COVID-Net Team, Figure1 COVID-19 chest X-ray data initiative Actualmed COVID-19 chest X-ray data initiative COVID-19 Radiography Database -COVID-19 Chest X-ray Database Deep residual learning for image recognition