key: cord-0701154-mdhnhw76 authors: Li, Qian; Ning, Jiangbo; Yuan, Jianping; Xiao, Ling title: A depthwise separable dense convolutional network with convolution block attention module for COVID-19 diagnosis on CT scans date: 2021-09-08 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104837 sha: e288294ee1339958e0047e62e2292bb552341cc6 doc_id: 701154 cord_uid: mdhnhw76 Coronavirus disease 2019 (COVID-19) has caused more than 3 million deaths and infected more than 170 million individuals all over the world. Rapid identification of patients with COVID-19 is the key to control transmission and prevent depletion of hospitals. Several networks have been proposed to assist radiologists in diagnosing COVID-19 based on CT scans. However, CTs used in these studies are unavailable for other researchers to do deeper extensions due to privacy concerns. Furthermore, these networks are too heavy-weighted to satisfy the general trend applying on a computationally limited platform. In this paper, we aim to solve these two problems. Firstly, we establish an available dataset COVID-CTx, which contains 828 CT scans positive for COVID-19 across 324 patient cases from three open access data repositories. To our knowledge, it has the largest number of publicly available COVID-19 positive cases compared to other public datasets. Secondly, we propose a light-weighted hybrid neural network: Depthwise Separable Dense Convolutional Network with Convolution Block Attention Module (AM-SdenseNet). AM-SdenseNet synergistically integrates Convolutional Block Attention Module with depthwise separable convolutions to learn powerful feature representations while reducing the parameters to overcome the overfitting problem. Through experiments, we demonstrate the superior performance of our proposed AM-SdenseNet compared with several state-of-the-art baselines. The excellent performance of AM-SdenseNet can improve the speed and accuracy of COVID-19 diagnosis, which is extremely useful to control the spreading of infection. Coronavirus disease 2019 (COVID- 19) , named SARS-CoV-2 by the International Committee on Taxonomy of Viruses (ICTV), is a highly infectious respiratory disease. More than 170 million confirmed COVID-19 cases and 3 million deaths have been reported in roughly 200 different countries and territories, as of June 7 in 2021. Rapid identification of patients with COVID-19 has been recommended by World Health Organization (WHO) to control transmission and prevent depletion of hospitals. Reverse transcription-polymerase chain reaction (RT-PCR) serves as the gold standard for COVID-19 diagnosis. However, the total positive rate of RT-PCR for throat swab samples was reported to be approximately 30%-60% at the initial presentation [1] . Additionally, it takes 4-6 hours to provide results, which is much slower than the speeding of the COVID-19. As a result, some infected patients cannot be identified early and continue to infect others unintentionally. To mitigate the inefficiency of RT-PCR, chest computed tomography (CT) has been used to supplement RT-PCR testing of patients with suspected COVID-19 [2] . Several studies [3, 4] have shown that CT scan manifests clear radiological findings of COVID-19 cases such as ground-glass opacities and consolidations. And CT scans are promised as a more efficient testing tool in serving for COVID-19 diagnosis with higher sensitivity [5] . However, chest CT contains hundreds of slices, which takes a long time for specialists to diagnose. Time pressure, heavy workload, and lack of experienced radiologists result in challenges in the imaging-based analysis of COVID-19 [6] . Fortunately, Artificial Intelligence (AI) has been widely used in the field of medical images, such as lung nodule [7] , tuberculosis [8] , breast cancer [9] , tumor [10] , etc. Rapidly developed AIbased automated CT image analysis tools can achieve high accuracy in the detection of Coronavirus positive patients. To alleviate the burden of medical professionals, there have been increasing efforts on developing computer-aided detection (CAD) systems to assist radiologists in diagnosing COVID-19 based on CT scans [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] . The CAD systems for the detection of COVID-19 are mainly divided into two modules: segmentation and classification. Segmentation aims to segment lung regions where lesions are located, then the segmented regions could be further fed to a classification module for COVID-19 detection. Segmentation is an essential step in image processing and analysis for the assessment and quantification of COVID-19. It delineates the regions of interest (ROIs), e.g., lung and infected regions, in the CT images. To segment ROIs in CT, U-Net [21] , U-Net++ [22] , and V-Net [23] are widely used. In the classification module, there are several studies aiming to separate COVID-19 patients from Normal subjects. ResNet [24] is the most widely used network for diagnosing COVID-19 from CT scans [14, 15, 16, 17, 18] . Besides being based on a ResNet, some networks further combine with the attention mechanism and feature pyramid network (FPN) [25] to focus on important features [14, 16] . Given the problem of limited datasets, studies adopt strategies such as weak label [11] , transfer learning [19] , human-in-the-loop [13] , and data augmentation [20] to improve the evaluation indicators of the CAD systems. Although these studies have achieved good results, there are still two major hurdles: (1) CT scans used in these studies are unavailable for public access and use due to privacy concerns. Consequently, these works are difficult to copy and adapt, which greatly hinders the research and development of deep learning methods; (2) These networks are too heavy-weighted to satisfy the general trend applying on a computationally limited platform. To address the first problem, we establish a publicly available dataset COVID-CTx. It is composed and modified by three open access data repositories: COVID-SIRM [26] , COVID-Seg [27] , and COVID19-CT [19] . To the best of our knowledge, COVID-CTx has the largest number of publicly available COVID-19 positive cases compared to other public datasets. All CT scans on these three datasets are confirmed by senior radiologists. For the second, we propose a light-weighted hybrid neural network: Depthwise Separable Dense Convolutional Network with Convolutional Block Attention Module (AM-SdenseNet). AM-SdenseNet uses a Dense Convolutional Network (DenseNet) [28] as the basic network. Compared with ResNet, DenseNet can better satisfy the general trend of application on a limited platform. At the same time, AM-SdenseNet synergistically applies Convolutional Block Attention Module (CBAM) [29] on DenseNet in each block with residual learning. This operation can greatly focus on important features and suppress unnecessary ones. In addition, we replace some regular convolutions in DenseNet with depthwise separable convolutions [30] to reduce training parameters while keeping network performance. The paper is organized as follows. Section 2 introduces related works regarding existing accessible COVID-19 datasets, COVID-19 diagnosis systems, depthwise separable convolution, and attention module. Section 3 presents the COVID-CTx dataset, data preprocessing, and AM-SdenseNet architecture. In Section 4, we compare the performance of AM-SdenseNet with six state-of-the-art networks: MobileNet [31] , InceptionV3 [32] , ResNet50, VGG16 [33] , DenseNet169, and DenseNet121. Finally, Section 5 concludes the paper and discusses future directions. Due to privacy issues, few data sets with massive CT scans on COVID-19 are available for public access and use until now. Existing larger data sets on COVID-19 are COVID-SIRM, COVID-Seg, and COVID19-CT. The Italian Society of Medical Radiology (SIRM) has publicly provided 100 CT scans across 68 patient cases [26] . Each case in the COVID-SIRM provides detailed information of the lesions in the CT images. In addition, some cases are followed up and treated at intervals of about 4 days. COVID-Seg [27] contains 40 labeled COVID-19 CT scans. Left lung, right lung, and infections are labeled by two radiologists and verified by an experienced radiologist. The University of California constructs a publicly available COVID19-CT data set [19] , which contains 349 CT scans that are positive for COVID-19 from 216 COVID-19 cases. All CTs were collected from public websites such as medRxiv, bioRxiv, journals, or papers related to COVID-19. Other existing data sets on COVID-19 are mainly X-ray images. For example, COVID-19 Radiography Database, generated by Chowdhury et al. [20] , is comprised of 1200 COVID-19 positive images and 1341 normal images. It is worth noting that the current public data sets still have a very limited number of images for the training and testing of models, and the quality of data sets is insufficient. During the outbreak of COVID-19, there have been increasing efforts on COVID diagnosis systems to perform screening COVID-19 based on CT scans. Segmentation is an essential step in COVID-19 diagnosis systems to assess and quantify COVID-19. The popular segmentation networks for CT scans in COVID-19 applications include U-Net, U-Net++, and V-Net. The U-Net, a U-shape network, is a popular technique for segmenting lung regions in medical images. For example, Zheng et al. [11] obtain all CT lung masks through a pre-trained U-Net in COVID-19 applications. Various U-Net, meanwhile, has been developed, reaching better segmentation results in COVID-19 image segmentation. Jin et al. [18] propose a two-stage pipeline for diagnosing COVID-19 in CT images, in which the whole lung is first detected by an efficient network based on U-Net++. The U-Net++ can greatly improve the performance of segmentation, as the network inserts a nested convolutional structure between the encoding and decoding path based on U-Net. In addition, CT provides high-quality 3D images for detecting COVID-19. Significantly, V-Net is a 3D image segmentation approach, where volumetric convolutions were applied instead of processing the input volumes slice-wise. To fully utilize the spatial information of CT, Shan et al. [13] propose a 3D segmentation system VB-Net, which combines V-Net with the bottle-neck structure [24] . Obviously, all of these networks can achieve better performance of segmentation. However, their training is difficult without adequate labeled data. In COVID-19 CT segmentation, sufficient labeled data for segmentation tasks is often unavailable since manual delineation for lung regions is labor-intensive and timeconsuming. Segmentation could be used to preprocess the CTs, and classification takes advantage of those segmentation results into the diagnosis. There are many studies aiming to separate COVID-19 patients from non-COVID-19 subjects. For example, Jin et al. [15] establish a deep learning system for COVID-19 detection, which outperforms five radiologists in more challenging tasks at a speed of two orders of magnitude above them. It fully proves that CAD systems can assist radiologists to accelerate the speed of diagnosing COVID-19. ResNet is the most widely used network in CAD systems. For example, Xu et al. [14] propose a diagnostic model, based on a ResNet18, to classify normal, viral, and COVID-19 from CT scans. Furthermore, they add a location-attention mechanism to improve the overall accuracy, which reaches 86.7%. Li et al. [34] preprocess the 2D slices to extract lung regions using U-Net, and a ResNet50 model combined with max-pooling for diagnosis. The model achieves results with a specificity of 96%, a sensitivity of 90%, and an AUC of 0.96 in identifying COVID-19. Similarly, Jin et al. [18] propose a U-Net++ based segmentation model for locating lesions and a ResNet50 based classification model for diagnosis. The specificity and sensitivity using the proposed U-Net++ and ResNet50 combined model are 92.2% and 97.4%. Given the problem of limited datasets, He et al. [19] propose a self-supervision transfer learning method (Self-Trans). The network integrates contrastive self-supervised learning with transfer learning to learn powerful and unbiased feature representations for reducing over-fitting. The testing dataset shows an AUC of 0.94, even though the number of training CT scans is just a few hundred. Zheng et al. [11] propose a weakly-supervised network: 3D deep CNN (DeCoVNet), which can accurately predict the COVID-19 infectious probability in chest CT volumes without labeled lesions for training. DeepPneumonia, proposed by Song et al. [16] , uses a ResNet50 backbone with the FPN and attention module as the feature-extracting part. The network can distinguish the COVID-19 patients from others with an excellent AUC of 99% and sensitivity of 93%. Additionally, those researches [19, 11, 20] all apply data augmentation to overcome the overfitting problem. In summary, the above approaches have been proposed for CT-based COVID-19 diagnosis with generally promising results. But most of the mentioned COVID-19 classification techniques were training on large datasets. CT scans used in these studies are unavailable for public access and use, which greatly hinders the research and development of deep learning methods. In addition, these networks are too heavyweighted to satisfy the general trend applying on a computationally limited platform. To achieve higher accuracy, the network has become much deeper and more complicated [33] . However, the general trend has been to achieve the recognition tasks in a timely fashion on a computationally limited platform [31] . Depthwise separable convolutions, firstly proposed by Sifre et al. [30] , aim to reduce training parameters. The operation is to split the corresponding area and channel of the processed image. Later, Inception V1 and Inception V2 use a depthwise separable convolution as the first layer to lighten the network [35, 36] . Howard et al. [31] introduce Mo-bileNets based on depthwise separable convolutions. This network is mostly used in mobile and embedded vision applications, largely due to its lightness. Jin et al. [37] and Wang et al. [38] also do related work aiming at reducing the size and computational cost of convolutional neural networks. Chollet et al. [39] present a novel architecture based on depthwise separable convolutions, named Xception. It shows large gains on the JFT dataset [40] . It's well known that attention plays a crucial role in human perception [41] . Presently, there have been several studies about attention mechanisms to improve the performance of CNNs. Wang et al. [40] propose an encoderdecoder style attention module: Residual Attention Network. The network not only performs well but is also robust to noisy inputs. But directly computing the 3d attention, this network has more computational and parameter overhead. Hu et al. [42] propose a Squeeze-and-Excitation module to exploit the inter-channel relationship. Although using a global average-pooled to compute channel-wise attention, they miss spatial attention. Woo et al. [43] introduce CBAM, which learns channel attention and spatial attention separately. For the qualitative analysis, the authors prove that CBAM outperforms other attention modules, comparing the accuracy improvement from others. Notably, the attention mechanism is reported as an efficient localization method in screening, which can be adopted in COVID-19 applications [14, 44] . Dataset plays a critical and essential role in deep learning models. However, few datasets with massive CT scans for COVID-19 are available for public access and use until now. To address this problem, we establish a publicly available dataset COVID-CTx. Though the largest of its kind, COVID-CTx still can't meet the training of large networks. For this problem, we propose a light-weighted hybrid neural network AM-SdenseNet for combating overfitting. In the next subsections, we will introduce the COVID-CTx, data preprocessing, and AM-SdenseNet architecture in turn. COVID-CTx contains 828 CT scans that are positive for COVID-19 across 324 patient cases. The CT scans from the same patient are visually similar. Hence, our dataset has the largest number of publicly available COVID-19 positive cases and richer features about COVID-19 compared to other public datasets. To build the COVID-CTx, we collected positive CTs for COVID-19 from three open access data repositories: COVID-SIRM, COVID-Seg, and COVID19-CT. Our dataset contains 828 CT scans positive about COVID-19, of which 100 CTs are from COVID-SIRM, 379 CTs are from COVID-Seg, and 349 CTs are from COVID19-CT. In addition, COVID-CTx also contains 1000 negative CT scans negative for COVID-19. Those CT scans, obtained from LUNA16 [45] , are normal or containing lung nodules. Fig.1 shows example CT images from COVID-CTx, of which a) is from COVID-SIRM, b) is from COVID-Seg, and c) is from COVID19-CT. Lung segmentation is an essential step in COVID-19 diagnosis systems to assess and quantify COVID-19. But the training of segmentation networks for CT scans in COVID-19 applications is difficult. Sufficient labeled data for segmentation tasks is often unavailable since manual delineation for lung regions is labor-intensive and timeconsuming. Here, U-Net and U-Net++ are first trained from scratch on the LUNA16 and fine-tuned on the COVID-CTx, of which the lung masks are produced by us. In addition, we chose K-means clustering [46] to obtain lung regions due to its simple principle and easy implementation [47, 48] . Fig.2 shows results achieved by the above three lung segmentation methods. Obviously, K-means reaches a better segmentation result on COVID-19 image segmentation. This demonstrates the effectiveness of K-means, which is more suitable for the segmentation task with insufficient labeled data. Consequently, we segment lung regions by K-means clustering in this study. The process of lung segmentation is shown in Fig.3 , and the specific method is as follows. (1) The original CT scan is preprocessed to enhance the contrast. (2) We separate the foreground (opaque tissue) and background (transparent tissue, the lungs) in the image based on the K-means. (3) We use the morphological closing operation to eliminate the residual trachea. (4) The hole filling operation algorithm is used to fill the maximum connectivity area in the reverse closing image. Subsequently, the filling image subtracts the reverse closing image to get the lung mask. (5) The lung region is extracted by its multiplication with the original CT scan. Though the largest of its kind, COVID-CTx may still have the over-fitting problem for data-hungry deep learning models. Therefore, two different image augmentation techniques (horizontal mirror and vertical mirror) are utilized to generate COVID-19 training images, as shown in Fig.4 As shown in Fig.5 , AM-SdenseNet is mainly composed of three AM-Sdense blocks, two transition layers, and one classifier. The AM-Sdense blocks use a DenseNet as the basic network structure to extract image features. Simultaneously, we replace some regular convolutions in the DenseNet with depthwise separable convolutions to reduce training parameters while keeping network performance. In addition, we also apply CBAM on the DenseNet in each AM-Sdense block with residual learning. This operation aims to improve the representation of objective lesion features and suppress the less relevant ones. The layers between two contiguous dense blocks are transition layers, which do convolution and pooling. It consists of batch normalization (BN) [36] layer and 1×1 convolution layer followed by 2×2 average pooling layer. The output of the last block is fed to the classifier for the final prediction of lung segmentation. The classifier contains a global average pooling layer, a dropout layer with probability = 0.5, and a dense layer with the sigmoid activation function. We now detail the key components of AM-Sdense blocks. In our network, the number of layers in the three AM-Sdense blocks are 6, 12, 24, respectively. Besides, the growth rate of our network is = 24 and the compression of our network is = 1. A 6-layer dense block with a growth rate of = 24 is shown in Fig.6 . Our proposed network takes all preceding feature-maps with direct connections as input. Consequently, the channel count used for the input of the following layer increases with the growth rate k. The ℎ layer can be received in Equation (1). Where ( 0 , 1 , ⋯ , −1 ) are the feature-maps produced in layers (0, 1, ⋯ , − 1). (⋅) means to concatenate all input in the channel axis. (⋅) means four consecutive operations: BN, rectified linear unit (ReLU) [49] , convolution (Conv), and depthwise separable convolution (SepConv). The version of is BN-ReLU-Conv(1×1)-BN-ReLU-SepConv(3× 3). Many networks show large gains by using depthwise separable convolutions [35, 36, 39] . Therefore, we replace some regular convolutions in DenseNet with depthwise separable convolutions to reduce the complexity of the model as much as possible. A depthwise separable convolution consists of two parts: a depthwise convolution and a pointwise convolution. The depth-wise convolution is a spatial convolution performed independently over every channel of input. And the pointwise convolution is a regular convolution with 1 × 1 windows. It projects the channels computed by the depthwise convolution into a new channel space. Fig.7 shows the schematic diagram. The specific calculation formulas of the two convolutions are as follows. , Where is an image, and ( , ) represents pixels. The ( , , ) is the kernel with size ( , , ). and are the parameters of each separable convolution. The parameters of a regular convolution and a depthwise separable convolution are shown in Table 2 . As can be seen above, the numbers of parameters for a regular convolution and a depthwise separable convolution are × × and + × , respectively. When is much bigger than 1(as is usually the case), the parameters of separable convolution are much smaller than those of a regular convolution. In this paper, we replace some regular convolutions in DenseNet Considering the limited data and computationally limited platform, we apply the CBAM on DenseNet to focus on important features and suppress the less relevant ones. The overview of CBAM is shown in Fig.8 . The channel attention module exploits the inter-channel relationship of features. And the spatial attention module subsequently generates a spatial attention map by utilizing the inter-spatial relationship of features. Given the input feature map ∈ × × , the channel attention module uses a global average pooling and a global max pooling to extract the average-pooled features and the max-pooled features respectively. Both features are passed through a multi-layer perceptron (MLP), and then summed element-by-element to get a channel attention map ( ) ∈ 1 × 1 × . The channel attention HxWx1 ( ) and input feature map are multiplied elementwise to exploits the inter-channel relationship of the features map . So that the channel feature map ∈ × × is being generated as Equation (4). Where (⋅) denotes a sigmoid function and ⊗ denotes element-wise multiplication. In the spatial attention module, the average pooling and max pooling along the channel axis are used to extract two feature maps and . Subsequently, two feature maps concatenated in the channel axis are forward to a convolution layer to produce a spatial attention map ( ) ∈ × × 1. The channel attention map ( ) and input feature map are multiplied element-wise to exploits the inter-spatial relationship of the features map . Finally, the refined feature map ∈ × × is computed as Equation (5). Where (⋅) represents a convolution operation with a filter size of 7 × 7. In our research, we apply CBAM on DenseNet with residual learning, as shown in Equation (6). Where ( ) is the output of the AM-Sdense block, is the input of CBAM and dense block. ( ) is the output of CBAM. And ( ) is the output of the dense block. It is worth noting that residual learning can greatly avoid the problem of interference by the attention mechanism. Even when the CBAM output ( ) is 0, ( ) still retains main features in ( ). As clearly seen from Fig.9 , CBAM can greatly focus on lesion features in ( ). Subsequently, lesions in ( ) perform more prominent than those in ( ). When dataset is limited, conventional shallow CNN models produce better results as compared to deeper models [50] . Therefore, AM-SdenseNet only has three dense blocks. The number of layers in the three AM-Sdense blocks are 6, 12, 24, respectively. In addition, the growth rate for our networks is = 24 and the compression for our network is = 1. AM-SdenseNet takes the lung regions as the input images with size 512 × 512 × 3. Before the data enters the first AM-Sdense block, a convolution layer and a max pooling layer with 48 output channels are performed on the input images. In addition, we use 1 × 1 convolution followed by 2 × 2 average pooling between two contiguous dense blocks to change feature-map sizes. At the end of the thirdly dense block, a global average pooling is performed and then a dropout layer with probability = 0.5 is attached. Finally, a dense layer with the sigmoid activation function directly outputs the probabilities of being COVIDpositive and COVID-negative. Table 3 shows the details of AM-SdenseNet architecture, where each "conv" layer corresponds to the sequence BN-ReLU-Conv and "sconv" corresponds to the sequence BN-ReLU-SepConv. We empirically demonstrate the effectiveness of AM-SdenseNet on the COVID-19 detection and compare it with state-of-the-art architectures, such as DenseNet and its variants. In the following subsections, we will introduce experimental settings, the influence of CBAM and depthwise separable convolution on the AM-SdenseNet, and comparisons between our approach with several state-of-the-art baselines on the COVID-19 classification task. Our COVID-CTx is used in all approaches. Besides COVID-CTx, we also evaluate seven networks trained on COVID-19 Radiology Database. All images are resized to 512 × 512 × 3. The radio of training set, validation set, and test set is 0.6 ∶ 0.2 ∶ 0.2. It is worth noting that each patient belongs to a single set. BN is used through all models, and binary cross-entropy serves as the loss function. The optimizer is Adam [51] with an initial learning rate of 5 − 5 and a weight decay of 1 − 7. The models are implemented in Keras and trained with Tesla V100. All models are trained with 40 epochs and a mini-batch size of 16. We evaluate different networks using four metrics: accuracy, precision, recall, and F1-score, which are shown in Table 4 . Where TP and TN mean true positive and negative parameters, respectively. FP and FN are false positive and false negative values respectively. 3-Folds cross-validation is applied by all approaches. To demonstrate the efficacy of AM-SdenseNet and investigate the effects of CBAM and depthwise separable convolution, we first experiment on the denseNet (initial network), AM-denseNet (combined with CBAM), and AM-SdenseNet. The comparative performance for three CNNs for COVID-19 classification problem with and without augmentation is shown in Table 5 and comparative AUC curves are shown in Fig.10 . Additionally, we also built the confusion matrices for three models, as shown in Fig.11 . As shown in Table 5 , three models with augmentation all show some increase in four performance metrics. It proves that the benefits of augmentation are highly significant when the data set is limited. Comparing denseNet with AM-denseNet, we can see that the network combined with CBAM yields higher classification performance. Comparing AM-denseNet with AM-SdenseNet, depthwise separable convolution reduces network parameters by 17.9% and improves precision by 2.03% at the same time. We empirically demonstrate depthwise separable convolution can reduce training parameters while keeping network performance. In summary, the performance of AM-SdenseNet benefits a lot from CBAM and depthwise separable convolution. Trained with augmented images, AM-SdenseNet is producing the highest accuracy of 99.18%, precision of 99.32%, recall of 98.97%, and F1 score of 99.14% with the least parameters 3.38M. Fig.10 shows AM-SdenseNet a) Without data augmentation b) With data augmentation achieves the best curve through data augmentation. According to the confusion matrices presented in Fig.11 , AM-SdenseNet can classify the Normal class with 0 misclassifications and classify the COVID-19 class with 3 misclassifications. Few data sets with massive CT scans on COVID-19 are available for public access and use until now due to privacy concerns. Hence, to illustrate the effectiveness of AM-SdenseNet in COVID-19 diagnosis, we separately investigate the performance of networks trained on COVID-CTx and COVID-19 Radiography Database with different backbones, including MobileNet, InceptionV3, ResNet50, VGG16, DenseNet169, DenseNet121, and AM-SdenseNet. Table 6 shows the metrics for seven networks trained on COVID-CTx and COVID-19 Radiography Database. Trained on COVID-CTx, DenseNet169 and VGG16 present a better performance than InceptionV3 and ResNet50 with fewer parameters. What's more, DenseNet121 outperforms DenseNet169 on the COVID-19 classification task, achieving higher accuracy and precision. This significantly proves that conventional shallow CNN models produce better results as compared to deeper models when the data set is limited. But the network is too simple to extract features well. For instance, MobileNet lacks stronger feature representation learning capabilities, so as to produce low classification performance. Compared with MobileNet, DenseNet can strengthen feature propagation and substantially reduce the number of parameters. The performance benefits a lot from dense connections. For example, DenseNet121 and DenseNet169 achieve good performance on the COVID-19 classification. AM-SdenseNet, however, has a much smaller number of parameters but the performance is better than deeper networks such as DenseNet121 and DenseNet169. The reason is that large-sized networks are more prone to over-fitting, especially considering that our data set is fairly small. Under such circumstances, CBAM and depthwise separable convolution have a better chance to play their value. According to the confusion matrices presented in Fig.11 and Fig.13 , AM-SdenseNet presents the highest radio of classifying the COVID-19 class and the Normal class. Other models have achieved good performance in the classification of the Normal class, but their performance in the classification of COVID-19 is not good enough. Compared with CT, X-ray is more easily accessible around the world. However, due to the ribs projected onto soft tissues in 2D and thus confounding image contrast, the classification of X-ray images is even more challenging. Trained on the COVID-19 Radiography Database, AM-SdenseNet reaches the best results with an accuracy of 98.62%. It is confirmed that AM-SdenseNet is universal in COVID-19 diagnosis. In Fig.12 , AM-SdenseNet also achieves the best curve on the COVID-19 classification task. Through the above experiments, AM-SdenseNet achieves the best performance on the COVID-19 classification task compared to other models. Such experimental results not only illustrate the effectiveness of COVID-CTx, but also provide concrete evidence that AM-SdenseNet has stronger capabilities to improve the speed and accuracy of COVID-19 diagnosis. The comparison of AM-SdenseNet with the latest approaches for COVID-19 classification is presented in this section. Table 7 shows that only limited COVID-19 images are used in most of these approaches. Given the problem of insufficient samples, studies adopt models such as DenseNet, ResNet, and ensemble networks to combat overfitting. While considering the performance metrics in Table 7 , our approach outperforms the considered state-of-the-art approaches, achieving the best classification performance from CT scan and chest X-ray images, respectively. It is further confirmed that AM-SdenseNet is superior to other algorithms in COVID-19 diagnosis and useful to control the spreading of infection. In our research, we aim to develop a sample and efficient CAD system to diagnose COVID-19 from CT scans. To accelerate the open study in this area, we establish a publicly available data set COVID-CTx. To our knowledge, it has the richest features about COVID-19 compared to other public data sets to date. Although the largest informative, it still has a risk of overfitting for data-hungry deep learning models. For this problem, we propose a light-weighted hybrid neural network: AM-SdenseNet. The network synergistically applies CBAM on DenseNet in each block with residual learning, which can greatly improve the representation of objective lesion features and suppress the less relevant ones. In addition, we replace some regular convolutions in DenseNet with depthwise separable convolutions to reduce training parameters while keeping network performance. Through experiments, it has been proved that AM-SdenseNet can greatly improve the speed and accuracy of COVID-19 diagnosis, which is extremely useful to control the spreading of infection. Though the largest of its kind, COVID-CTx still can't meet the training of large networks, which need a larger dataset. However, deep learning with small samples is still an important research direction in the future. Unsupervised learning methods such as stack auto-encoder [54] and restricted boltzmann machine [55] can be used for reference. Considering the actual clinical need, the CAD system can be combined with the hospital's imaging system and electronic medical records to achieve the follow-up treatment of patients in the future. No funds, grants, or other support was received. Correlation of chest ct and rt-pcr testing for coronavirus disease 2019 (covid-19) in china: a report of 1014 cases Essentials for radiologists on covid-19: an update-radiology scientific expert panel Ct imaging features of 2019 novel coronavirus (2019-ncov) Covid-19 pneumonia manifestations at the admission on chest ultrasound, radiographs, and ct: single-center study and comprehensive radiologic literature review Sensitivity of chest ct for covid-19: comparison to rt-pcr Deep learning-based triage and analysis of lesion burden for covid-19: a retrospective study with external validation Computer-aided classification of lung nodules on computed tomography images via deep learning technique Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer Classification using deep learning neural networks for brain tumors Deep learning-based detection for covid-19 from chest ct using weak label Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography Lung infection quantification of covid-19 in ct images with deep learning A deep learning system to screen novel coronavirus disease 2019 pneumonia Development and evaluation of an artificial intelligence system for covid-19 diagnosis Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: evaluation of the diagnostic accuracy Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks Sampleefficient deep learning for covid-19 diagnosis based on ct scans Can ai help in screening viral and covid-19 pneumonia? U-net: Convolutional networks for biomedical image segmentation Unet++: A nested u-net architecture for medical image segmentation, in: Deep learning in medical image analysis and multimodal learning for clinical decision support V-net: Fully convolutional neural networks for volumetric medical image segmentation Deep residual learning for image recognition Feature pyramid networks for object detection Covid-19 ct lung and infection segmentation dataset Densely connected convolutional networks Cbam: Convolutional block attention module Rigid-motion scattering for texture classification Mobilenets: Efficient convolutional neural networks for mobile vision applications Rethinking the inception architecture for computer vision Very deep convolutional networks for large-scale image recognition Going deeper with convolutions Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: International conference on machine learning Flattened convolutional neural networks for feedforward acceleration Proceedings of the IEEE International Conference on Computer Vision Workshops Xception: Deep learning with depthwise separable convolutions Distilling the knowledge in a neural network Control of goal-directed and stimulusdriven attention in the brain Residual attention network for image classification Squeeze-and-excitation networks Attention u-net based adversarial architectures for chest x-ray lung segmentation The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans K-means-type algorithms: A generalized convergence theorem and characterization of local optimality Novel centroid selection approaches for kmeansclustering based recommender systems Clustering approach based on mini batch kmeans for intrusion detection system over big data Proceedings of the fourteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings Deep learning model for distinguishing novel coronavirus from other chest related infections in x-ray images Adam: A method for stochastic optimization Automated diagnosis of covid-19 using deep supervised autoencoder with multi-view features from ct images Covid-resnet: A deep learning framework for screening of covid19 from radiographs Stacked enhanced auto-encoder for data-driven soft sensing of quality variable Deep learning of representations for unsupervised and transfer learning Qian Li: Conceptualization of this study, methodology, dataset, experiments, and writing paper. Jiangbo Ning: Investigation, validation, writing, review and editing. Jianping Yuan: Data processing. Ling Xiao: Validation and formal analysis. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.