key: cord-0965693-esu6t7ui
authors: Quan, Hao; Xu, Xiaosong; Zheng, Tingting; Li, Zhi; Zhao, Mingfang; Cui, Xiaoyu
title: DenseCapsNet: Detection of COVID-19 from X-ray Images Using a Capsule Neural Network()
date: 2021-04-15
journal: Comput Biol Med
DOI: 10.1016/j.compbiomed.2021.104399
sha: 670e1940a34614a22b11fabf06ef6c628afd5be0
doc_id: 965693
cord_uid: esu6t7ui

At present, the global pandemic as it relates to novel coronavirus pneumonia is still a very difficult situation. Due to the recent outbreak of novel coronavirus pneumonia, novel coronavirus pneumonia chest X-ray (CXR) images that can be used for deep learning analysis are very rare. To solve this problem, we propose a deep learning framework that integrates a convolution neural network and a capsule network. DenseCapsNet, a new deep learning framework, is formed by the fusion of a dense convolutional network (DenseNet) and the capsule neural network (CapsNet), leveraging their respective advantages and reducing the dependence of convolutional neural networks on a large amount of data. Using 750 CXR images of lungs of healthy patients as well as those of patients with other pneumonia and novel coronavirus pneumonia, the method can obtain an accuracy of 90.7% and an F1 score of 90.9%, and the sensitivity for detecting COVID-19 can reach 96%. These results show that the deep fusion neural network DenseCapsNet has good performance in novel coronavirus pneumonia CXR radiography detection.

COVID-CAPS, and they achieved excellent results, with an overall accuracy of 95.7% and sensitivity of 90%. The results proved that capsule network detection of COVID-19 was feasible.

Using deep learning technology to analyze medical images tests the quality and quantity of datasets, and it is difficult to obtain a large number of medical images with good quality when a new epidemic suddenly occurs. In this work, we focus on solving the problem of training excellent detection models on small COVID-19 datasets. This method takes the capsule network as the main body and DenseNet [19] as the feature extractor and proposes a deep learning framework DenseCapsNet for COVID-19 detection.

We indicate through experiments that DenseCapsNet can effectively train an excellent COVID-19 detection model using such small datasets. Compared with the COVID-19 detection model using the same data, our proposed network has better overall accuracy and sensitivity and can detect COVID-19 cases more accurately. In addition, this paper also gives the chest X-ray (CXR) image preprocessing operation to solve the problem of data heterogeneity between public datasets and the potential feasibility analysis of the framework for patient shunt and medical diagnosis. The main contributions of this paper are as follows:

1) Based on the capsule neural network (CapsNet) and the dense convolutional network (DenseNet), a deep learning framework, DenseCapsNet, which can realize end-to-end classification of COVID-19 automatic diagnosis, is proposed.

2) DenseCapsNet achieves state of the art performance by increasing the sensitivity of COVID-19 detection from 69% to 96% using X-rays.

3) A set of reasonable preprocessing process is proposed to alleviate the problem of image heterogeneity among different datasets. 

The overall structure of the framework is shown in Fig. 1 . First, to solve the problem of image heterogeneity between different dataset resources, CXR images are preprocessed to realize data normalization. Then, the preprocessed data are fed into the segmentation network to extract the lung region of the CXR image. Finally, the DenseNet part of the classification network extracts rich features from the lung region and transfers the features to the capsule network to infer the disease type of the CXR image. Hereinafter, the network framework will be described in detail.

Using different scanning equipment and scanning methods will inevitably lead to certain heterogeneity in data resolution, image size, image definition and other issues. To reduce the impact of data heterogeneity on network performance, we propose some preprocessing steps for CXR, the details of which are shown in Fig. 2 . First, grayscale processing is performed on the original CXR image and then the local contrast of the CXR image is improved by using contrast-limited adaptive histogram equalization (CLAHE) [20] to obtain more image details.

After the image enhancement processing of the CXR image is completed, the data augmentation operation will be carried out. Data augmentation methods should be as random as possible to generate more meaningful training data. In this study, random rotation, vertical flip and random cutting data augmentation are used to train the network to alleviate the overfitting problem. We also normalize the data to speed up the model fitting process and finally improve the generalization ability of the model. In the formula, x is the pixel intensity value in each channel of the three red-blue-green (RGB) channels of the CXR image, mean is the average pixel intensity value of each channel, and std is the standard deviation of the pixel intensity of each channel.

In this work, we need to extract the lung contour of CXR images with a segmented network. The proposed method ultimately selects TernausNet [21] with state-of-the-art (SOTA) performance as the semantic segmentation part. TernausNet uses the pretrained VGG11 as the encoding part and removes the full connection layer, the last max pooling and softmax layer of VGG11 [22] . In the decoding part, upsampling and convolution operations are carried out on the feature map extracted from the encoding, and the high-resolution features of the encoding part are combined through jump connections. The specific structure is shown in Fig. 3 .

When training TernausNet, we use three loss functions for back propagation. In reference [23] , BCEWithLogitsLoss, SoftDiceLoss and InvSoftDiceLoss loss functions are used to calculate the loss value and sum it to obtain the total loss value. The segmentation effect obtained by back propagation using the total loss value is optimal.

C. Classification Network 1) Using the convolutional neural network for feature extraction: DenseNet can fully flow information, reuse features, strengthen feature propagation and reduce the number of parameters through its densely J o u r n a l P r e -p r o o f connected features. These characteristics enable DenseNet to achieve better results than ResNet [24] under the same parameter quantity in less time. We use DenseNet121 after ImageNet [25] pretraining to prevent the effect of random model initialization on the model fitting ability and to ensure that the model can achieve better results on a smaller dataset to the greatest extent possible. DenseNet121 is the simplest type of dense convolution network and has the smallest number of parameters. The network structure is shown in Fig. 4 . We reduced the chest X-ray data to a unified size, enhanced it with data and then converted it into tensors. After normalization, we input it into the DenseNet121 network. The network structure is shown in Fig. 4 . We reduced the chest X-ray data to a unified size, enhanced it with data and then converted it into tensors. After normalization, we input it into the DenseNet121 network architecture.

Using DenseNet121's excellent feature extraction and feature reuse capabilities, we retained all the kinds of detailed features of the data as much as possible and finally output 1024 features.

2) Capsule Network: The architecture of the capsule network is relatively clear and mainly consists of a series of capsule layers [17] . The capsule layer of the capsule network is composed of multiple capsules, and each capsule is composed of a group of neurons. The primary capsule layer is preceded by a convolutional layer composed of 512 convolutional kernels with the size of 1*1, which filters the output features of DenseNet to 512. After the reshaping operation, a primary capsule layer is formed. Dynamic routing is adopted between the primary capsule layer and the convolution capsule layer to route the output vector to all possible parent nodes.

For the loss function of the capsule network, we use the spread loss function, which can reduce the sensitivity of training to the model initialization and superparameters, and is the target J o u r n a l P r e -p r o o f class activation. If the distance between the activation error classes and is less than the margin (m), the loss function will penalize the model according to ( − ( − )) . The loss function is defined as follows:

The initial value of the margin is set to 0.2. During the training process, the value of the margin increases by 0.1 to 0.9 per iteration, which can prevent early capsule death.

The structure of DenseCapsNet that we propose is shown in Fig.   1 . As a shallow network, CapsNet cannot extract deep features as does a deep neural network. Above, we briefly introduced DenseNet121 and explained the leading position of DenseNet121 in a CNN.

DenseNet121 as feature extractor is obviously a good choice and can compensate for the deficiency of CapsNet to some extent. DenseCapsNet is mainly composed of DenseNet121 and CapsNet so that it fully leverages the advantages of CNNs and capsule networks. First, we retain the feature extraction part of the CNN DenseNet121 (excluding the linear layer) to extract the features from COVID-19 chest X-ray data.

We input novel coronavirus pneumonia 3D X-ray data and adjust the data to a unified size. DenseNet 121 outputs 1024 feature maps and transmits them to the first convolution layer of the capsule network, which filters the above features to 512 and reshapes them into vectors to be transmitted to the main capsule layer. Finally, the capsule layer outputs the instantiated parameters containing normal and COVID-19

images, and the vector lengths represent the probability of the occurrence of each category.

The datasets used in this study are all public datasets; these datasets have been studied many times by other COVID-19 researchers, indicating the research value of these data. Statistics of CXR image data used to segment network training are shown in Table Ⅰ , and statistics of the dataset used to classify network training are shown in Table Ⅱ . part of the images to avoid the disclosure of the information marked to affect the model training.

We collected the data uniformly and asked the radiologist to select four representative CXR images J o u r n a l P r e -p r o o f ( Fig. 5 ) and made a detailed description of different case characteristics. In Fig. 5 (a), the two lungs had clear veins and no parenchymal lesion was seen in the lungs. The hilum of the two lungs was not large, and nodules and bump shadows were not seen. There was no abnormality in the size or shape of the heart shadow. The two diaphragms were smooth, and the two costal diaphragms had sharp angles. To summarize, it should be diagnosed as normal. In Fig. 5(b) , the two lung textures thickened, and the right lower lung had a small piece of flocculent shadow, uneven density, and fuzzy edges. There was consolidation of the lower lobe of left lung. In summary, a diagnosis of bacterial pneumonia should be made. In Fig. 5 (c), lung texture increased and blurred. A focal patchy image was apparent in the lung field. There were diffuse high-density shadows in both lungs. In summary, patients with viral pneumonia should be diagnosed. In Fig. 5(d) , the frosted glass-like structure is visible at the margin of the pulmonary vessels, with unclear boundaries. The lesions are asymmetrically patchy, and there is diffuse air turbidity.

In summary, a COVID-19 patient should be diagnosed.

After giving the lesion characteristics of typical patients in the public dataset, radiologists also use a computer to randomly select 200 chest X-ray images from a dataset for diagnosis. Radiology experts believe that the labels provided by the public dataset are accurate and that the patient characteristics of various pneumonias in the dataset are representative, further confirming the research value of the dataset.

To test the performance of the neural network framework we proposed, a computer was used to randomly select 750 CXR images from the above dataset, with 250 for normal, 250 for other pneumonia and 250 for novel coronavirus pneumonia. Then, 750 CXR images were randomly divided into training, validation and testing sets. See Table Ⅲ for details of the dataset division.

J o u r n a l P r e -p r o o f

In the experiment, for the semantic segmentation part, the number of epochs was set to 75, the learning rate was set to 1e-4, and the Adam optimizer was used. For the classification network, the number of epochs is set as 30, the learning rate is 1e-4, the optimizer uses ADAM [30] , and the number of dynamic routing iterations of the capsule network is 3. All network frameworks are implemented in PyTorch [31] using two NVIDIA TITAN Xp GPUs.

The Jaccard coefficient and Dice coefficient were used to evaluate the segmentation performance of the segmentation network. All the above evaluation metrics can measure the similarity between the finite sets. If there are two sets A and B, the Jaccard coefficient and Dice coefficient can be used to measure the similarity, which can be defined as: 

In this study, the pretreatment method mentioned in this paper was first used to preprocess the classified dataset to achieve the effects of CXR image histogram equalization and contrast enhancement.

The specific implementation process is implemented using Python and OpenCV. First, grayscale processing is performed on the original three-channel CXR image to obtain a single-channel image. Next, CLAHE is used to improve the image contrast (specific parameter settings: clipLimit = 3.0, tileGridSize = (8, 8) ) and adjust the image size uniformly to 512*512. Experimental results are shown in Fig. 6 . After pretreatment, CXR image details were more prominent, the lung contour was clearer, and the histogram was more balanced.

In the experiment, we selected three semantic segmentation networks with excellent performance, namely, GCN [34] , SegNet [35] and TernausNet. The NLMMC dataset was randomly divided into training and validation sets at an 8:2 ratio. All the above three semantic segmentation networks were trained and tested with the segmented NLMMC dataset, and the Jaccard and Dice coefficients were used to evaluate the ability of the network model to extract lung contours. The final results are shown in Table Ⅳ .

Obviously, TernausNet has achieved the best performance, and the Jaccard coefficient is even improved J o u r n a l P r e -p r o o f by nearly 0.1 compared with the second place SegNet, which may be because the encoder part is VGG11:

the network depth is relatively shallow, and the pretrained network has a stronger generalization ability and is more suitable for segmentation training tasks with small datasets. Fig. 7 shows the lung contour extracted from the NLMMC data of TernausNet. Due to the lack of a hand-prepared mask for the classified dataset, there is no way to use the classified data to train the segmentation network. Fortunately, the TernausNet segmentation model trained with the NLMMC dataset also achieved good results in extracting the lung contour from the classified data, and the results are shown in Fig. 8 .

The classification performance of DenseCapsNet, a deep learning framework proposed in this paper, is shown in Table Ⅴ . All experiments were repeated five times and averaged. When training ResNet and DenseNet, we used ImageNet's pretraining parameters to initialize the network parameters and then trained the network with the COVID-19 dataset without freezing any layers. The confusion matrix for ResNet50, DenseNet121, and DenseCapsNet is shown in Fig. 9 . Before DenseCapsNet is formally proposed, the feature extraction part is compared first. In this work, ResNet50 and DenseNet121 were selected as candidate feature extraction parts. It can be clearly found from Fig. 10 . After the original CXR image is preprocessed, the lung contour is clearer and the contrast is more obvious, which can significantly improve the segmentation capability of the segmentation network on the classified dataset. This is mainly due to the CLAHE operation, which covers the gray distribution range of the original image and histogram in the range [0, 255] (Fig. 6) . The gray distribution is more uniform, and the image details are more obvious. Histogram equalization can indeed alleviate the heterogeneity of the image.

As seen from The proposed framework integrates DenseNet and CapsNet and uses classified datasets for training and testing. In Table Ⅴ , DenseCapsNet's five metrics are all superior to DenseNet121, with an average increase of 1.34%. Through the sensitivity analysis in Table Ⅵ , the sensitivity of DenseCapsNet to detect COVID-19 reached 96% and was significantly better than DenseNet121. The sensitivity of radiologists to diagnose COVID-19 by CXR images is 69%, and even gold standard RT-PCR can only reach 91% sensitivity [36] . Fig. 11 shows the result of using the grad-CAM [37] algorithm to locate the COVID-19 lesion area, and the red area is the suspicious COVID-19 lesion area located on the network. First, we used the trained model to locate the suspected lesion area in the segmented chest X-ray. Next, we superimposed the resulting heat map onto the original chest X-ray image to fit the radiologist's reading habits. Radiologists gave professional explanations for the CXR image shown in Fig. 11 : In image a, the texture of both lungs was thickened, and a few pieces of flocculation shadows were seen in the lower lobe of both lungs, with uneven density and blurred edges, among which the exudation shadow was the most obvious in the lower J o u r n a l P r e -p r o o f lobe of the right lung. In image b, the lung in the lower lobe of the right lung is consolidated, with scattered ground glass nodules of both lungs, and multiple plaques and ground glass shadows in the outer lung and subpleural lung. In image c, the shadow exudates from the lower lobe of the right lung, with a larger heart. In image d, there are ground glass nodules on the upper lobe of the left lung with thickened and blurred local lines. The remaining lung has clear texture and no parenchymal lesion was observed.

The hilum of both lungs was not large, and the shape of the heart shadow was not abnormal. The a, b and d CXR images with heat maps are basically consistent with the focus areas diagnosed by radiologists.

Unfortunately, the heat map of c did not accurately locate the lesion area. In the discussion and analysis with radiology experts, it was found that there were artifacts on both sides of the pleural area of the image, which led to misjudgment of the network.

In the context of the global COVID-19 outbreak, many countries are facing a shortage of medical resources to diagnose, isolate and treat COVID-19 patients in a timely manner. Chest CT is more sensitive for detecting and diagnosing COVID-19 than CXR imaging, but CT equipment is expensive and difficult to move, and cross-infection may occur. Therefore, the use of an inexpensive, portable medical X-ray machine to obtain chest X-rays of patients with clinical characteristics such as cold and fever may be more suitable for the current global epidemic situation. Increasing the sensitivity of CXR to COVID-19 diagnosis is valuable and can alleviate some problems, such as shortages of medical resources.

In Fig. 12 , we present a flowchart of patient diagnosis with potential application value according to the proposed deep learning framework. The subject still uses the proposed deep learning framework to detect CXR images. According to the confusion matrix in Fig. 9 (c) , it can be seen that the proposed deep However, our method can help radiologists pinpoint the location of the suspected lesion, and we also provide a feasible solution for the proposed method to assist diagnosis ( Fig. 12. ), which will significantly relieve the pressure on doctors to respond to the epidemic.

To improve the sensitivity of CXR images in diagnosing COVID-19, this paper proposes a deep learning framework for COVID-19 detection based on DenseNet121 and CapsNet. The framework uses DenseNet121 to extract features and CapsNet to package features into capsules to reduce the dependence of neural networks on data volume. Only 250 normal chest radiographs, 250 other pneumonia chest radiographs and 250 COVID-19 chest radiographs were used to achieve remarkable diagnostic results, with accuracy and sensitivity reaching 90.7% and 96%, respectively. We have proven that in the task of using small datasets to detect COVID-19, the performance of the fusion of CNN and CapsNet is better than that of using CNN alone. We also propose the CXR image preprocessing operation, which can alleviate the problem of image heterogeneity between different datasets, and prove that the preprocessing operation can improve the ability of the segmentation network to extract lung contours across datasets. In -depth learning framework with high   sensitivity for diagnosing COVID-19 may be used for large-scale screening of suspicious COVID-19 patients, so this study also provides a feasibility analysis of large-scale screening and diagnosis of COVID-19 using the proposed neural network framework. We sincerely hope that the epidemic will pass as soon as possible and that people will be able to return to their normal lives.

The authors would like to thank health care workers worldwide and those in other industries who have contributed to COVID-19 relief efforts since the COVID-19 outbreak, as well as contributors to the public chest X-ray dataset. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We sincerely wish the people of all countries an early victory over COVID-19. Fig. 1 . The overall architecture of the proposed neural network. The (a) part is a segmentation network, and the (b) part is DenseCapsNet for feature extraction and classification. The transition layer can reduce the size of the feature graph and promote feature transfer between adjacent dense blocks. Finally, the information transmission of the network will be more intensive.

J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f Fig. 11 . These are four X-ray images of the chest of COVID-19 patients. The first row is the original CXR, and the second row is the suspicious lesion area located by the network. 

Timeline of WHO's response to COVID-19. World Health Organization Web site. Accessed

COVID-19) Dashboard. World Health Organization Web site. Accessed

Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: A report of 1014 cases

Chest Imaging Appearance of COVID-19 Infection

Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study

Chest CT for Typical 2019-nCoV Pneumonia: Relationship to Negative RT-PCR Testing

Frequency and Distribution of Chest Radiographic Findings in COVID-19

Positive Patients

Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique

COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images

Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks

Covid-19: automatic detection from X-ray images utilizing J o u r n a l P r e -p r o o f transfer learning with convolutional neural networks

COVIDX-net: A framework of deep learning classifiers to diagnose COVID-19 in X-ray images

Extracting possibly representative COVID-19 biomarkers from X-ray images with deep learning approach and image data related to pulmonary diseases

COVID-ResNet: A Deep Learning Framework for Screening of COVID19 from Radiographs

Dynamic Routing Between Capsules

COVID-CAPS: A capsule network-based framework for identification of COVID-19 cases from X-ray images

Densely Connected Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Contrast Limited Adaptive Histogram Equalization

TernausNet: U-Net with VGG11 Encoder Pre-Trained on

ImageNet for Image Segmentation

Very Deep Convolutional Networks for Large-Scale Image Recognition

Lung-Segmentation. Accessed

Deep residual learning for image recognition

ImageNet Classification with Deep Convolutional Neural Networks

Two public chest X-ray datasets for computer-aided screening of pulmonary diseases

COVID-19 image data collection

COVID-19 Radiography Database

Adam: A Method for Stochastic Optimization

Pytorch: An imperative style, high-performance deep learning library

Can AI Help in Screening Viral and COVID-19 Pneumonia?

Deep-COVID: Predicting COVID-19 From Chest X-Ray Images Using Deep Transfer Learning

Large Kernel Matters -Improve Semantic Segmentation by Global Convolutional Network

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Frequency and distribution of chest radiographic findings in COVID-19 positive patients

Grad-CAM: Visual explanations from deep networks via gradient-based localization

Within the lack of chest covid-19 x-ray dataset: a novel detection model based on gan and deep transfer learning

Automated detection of

COVID-19 cases using deep neural networks with X-ray images

CovXNet: A multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms

 A preprocessing method is proposed to alleviate the problem of image heterogeneity  Based on DenseNet and CapsNet, the deep learning framework DenseCapsNet is proposed  The sensitivity of COVID-19 based on DenseCapsNet was 96%  Location of COVID-19 lesions

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.