key: cord-0834177-bnnh9v0g authors: Manokaran, Jenita; Zabihollahy, Fatemeh; Hamilton-Wright, Andrew; Ukwatta, Eranga title: Detection of COVID-19 from chest x-ray images using transfer learning date: 2021-08-23 journal: J Med Imaging (Bellingham) DOI: 10.1117/1.jmi.8.s1.017503 sha: 333aab28277637d0e17a3bb6968088ebb8192569 doc_id: 834177 cord_uid: bnnh9v0g Purpose: The objective of this study is to develop and evaluate a fully automated, deep learning-based method for detection of COVID-19 infection from chest x-ray images. Approach: The proposed model was developed by replacing the final classifier layer in DenseNet201 with a new network consisting of global averaging layer, batch normalization layer, a dense layer with ReLU activation, and a final classification layer. Then, we performed an end-to-end training using the initial pretrained weights on all the layers. Our model was trained using a total of 8644 images with 4000 images each in normal and pneumonia cases and 644 in COVID-19 cases representing a large real dataset. The proposed method was evaluated based on accuracy, sensitivity, specificity, ROC curve, and [Formula: see text]-score using a test dataset comprising 1729 images (129 COVID-19, 800 normal, and 800 pneumonia). As a benchmark, we also compared the results of our method with those of seven state-of-the-art pretrained models and with a lightweight CNN architecture designed from scratch. Results: The proposed model based on DenseNet201 was able to achieve an accuracy of 94% in detecting COVID-19 and an overall accuracy of 92.19%. The model was able to achieve an AUC of 0.99 for COVID-19, 0.97 for normal, and 0.97 for pneumonia. The model was able to outperform alternative models in terms of overall accuracy, sensitivity, and specificity. Conclusions: Our proposed automated diagnostic model yielded an accuracy of 94% in the initial screening of COVID-19 patients and an overall accuracy of 92.19% using chest x-ray images. results, 2 and the cost of the test is a major concern in many countries that have a private health system. Although the PCR and antigen test can now provide a rapid diagnosis, the assessment of the lungs using medical imaging will provide information on disease burden. Also, faster and earlier detection of COVID-19 would help in isolating the affected patients sooner to alleviate the disease spread. Chest radiography (CXR) and computed tomography (CT) images are the conventional medical imaging modalities used in lung disease diagnosis. 3, 4 Though CT images are extensively used in the COVID-19 diagnosis, 5-7 cost 8 and radiation exposure are major concerns. CXR are preferred over CT images as they have less exposure to radiation and extensively available. 9 Hence, in this study, CXR images are used for automatic diagnosis of COVID-19. Deep learning techniques are widely used in various fields, such as computer vision, machine vision, and speech recognition, among which computer vision is one of the most popular fields in which promising results to have been obtained in image classification tasks. [10] [11] [12] In medical image analysis, deep learning has been widely investigated for computer-aided diagnosis and treatment. 13, 14 Several state-of-art methods have been proposed for the diagnosis of COVID-19 using CXR images [15] [16] [17] [18] [19] and CT images 5,20,21 based on deep learning techniques. The transfer learning approaches based on deep learning have been preferred in detection of COVID-19 due to the limited available dataset. Several studies have been implemented for COVID-19 diagnosis using the pretrained model as a feature extractor by implementing transfer learning techniques. [22] [23] [24] Table 1 summarizes recent studies describing methods developed for detecting COVID-19 from CXR images; the overall datasets used for training and testing the model, accuracy, and the sensitivity are provided. Ozturk et al. 18 developed the DarkCovidNet model for the detection of COVID-19 using multiclass classification. The model was developed using end-to-end architecture without adding any feature extraction techniques. Sensitivity of 85.35% was achieved by the proposed model. Wang and Wong 25 proposed COVID-Net, a deep convolutional neural network (DCNN) developed by adopting machine-driven design exploration strategy. One of the major limitations of the previous studies is the relatively smaller test dataset employed for the classification. Moreover, an imbalanced realistic representation of the patient population was not considered in the testing phase. Furthermore, some of the methods were designed to differentiate COVID-19 from normal cases by ignoring the pneumonia cases, or by combining bacterial, viral pneumonia, and COVID-19 under one class. To address these shortcomings and to further enhance the diagnosis of COVID-19, we propose a fully automated deep learning-based method for the detection of COVID-19 from CXR images. The proposed method was based on a transfer learning approach using dense convolutional neural network 201 (DenseNet201) 31 pretrained model. Our model was trained and tested using a dataset that reflected realistic class imbalance in the actual setting. As a benchmark, we compared results of our method to those of several state-of-the-art CNN architectures, including VGG16, 32 VGG19, 32 DenseNet121, 31 ResNet50, 33 ResNet101, 33 MobileNetv2, 34 and Inceptionv3. 35 2 Materials and Methods Our dataset comprised 8644 CXR images of normal, respiratory distress syndrome (ARDS), COVID-19, MERS, pneumonia, and SARS from five open-access GitHub repositories 36-40 used in Wang e al. 25 It consisted of 4000 normal cases, 4000 pneumonia cases, and 644 COVID-19 cases. Out of 644 images of COVID-19, 20% were taken from COVID-19 Image Data Collection, 36 5% from the Figure 1 COVID Chest X-ray Dataset, 40 25% from the Actualmed COVID-19 Chest X-ray Dataset, 39 and 50% of images were obtained from the COVID-19 Radiography Database. 38 About 5% of pneumonia images were taken from COVID-19 Image Data Collection 36 and the remaining 95% from the RSNA Pneumonia Detection Challenge dataset. 37 All the normal images were obtained from the RSNA Pneumonia Detection Challenge dataset. CXR images of COVID-19 affected patients have patchy and hazy lungs compared with normal, healthy lungs. Certain characteristic findings can be observed on the lower lobes and periphery of the lungs affected with COVID-19. Examples of normal, pneumonia, and COVID-19 are shown in Fig. 1 . Since the purpose of this paper is to correctly classify COVID-19 cases, both bacterial and viral pneumonia are grouped under one label (pneumonia). The CXR images are resized to 224 × 224 to be compatible with the pretrained model used in this study. Except for Xception where the default input size of 299 × 299 is used. All the images in the dataset used for training, validation, and testing are normalized between 0 and 1. Since the dataset is imbalanced, data augmentation techniques were incorporated to increase the training dataset. The training data of each class are increased to 4000 images summing up to 12,000 images in total. The images were horizontally flipped, magnified, and rotated for augmentation. The augmentation techniques were not applied on the validation and test data set. The 6223 (80%) of the dataset were used for training the model and the remaining 1729 (20%) of the dataset were used for testing the model. From the 80% of training dataset, 692 images (10%) were used for validation. A detailed description on the number of datasets used for training, validation, and test is provided in Fig. 2 . In this study, we developed our model based on DenseNet201, 31 which is a densely connected convolutional neural network that is 201 layers deep in which each layer receives collective knowledge from all preceding layers resulting in a compact network. It was designed to address the vanishing gradient issue in deep neural networks by employing skip connection similar to those of ResNet and has been proved to predict pneumonia at a higher rate using CXR images. 41 The fundamental difference between ResNet and DenseNet rests upon the fact that the features are combined using summation in ResNet, whereas they are concatenated in DenseNet leading to a dense connectivity pattern. This connectivity pattern introduced in DenseNet reduces the required training parameters resulting in a parametrically efficient model that has shown promising results in ImageNet and CIFAR-100 datasets. It is also memory and computationally efficient. DenseNet and ResNet have various versions, such as DenseNet121, 31 DenseNet201, 31 ResNet50, 33 and ResNet101, 33 which are considered in our study. The numbers represent the layers in the neural network. The proposed model was then evaluated and further compared with the other state-of-the-art pretrained models, such as VGG16, 32 VGG19, 32 DenseNet121, 31 ResNet50, 33 ResNet101, 33 MobileNetv2, 34 and Inceptionv3, 35 and for a quantitative justification, we compared our proposed model with a lightweight CNN model designed from scratch. Based on our previous experimental results on classification using cascaded and multiclass classification, we determined that the multiclass classification yielded higher accuracy in COVID-19 identification. 42 Also, the segmented lungs from the CXR images classified COVID-19 with a low sensitivity value compared with the nonsegmented lungs. 43 Hence, a multiclass classification is performed using the proposed model to classify COVID-19 from normal and pneumonia classes using nonsegmented CXR images. The developed model uses initial pretrained weights and the last fully connected layer is removed and replaced with the new network developed using the global averaging layer, batch normalization layer, and a dense layer with ReLU activation. The final classification layer implements the Softmax activation function for multiclass classification. This is kept consistent for all the pretrained models considered in this work. The models considered were trained for 100 epochs and the best fit was stored using the model check point that stores the model based on the maximum validation accuracy. The adaptive moment estimation (ADAM) optimizer with a learning rate of 0.0001 was used for training the model. An optimum value of 32 is considered as the batch size. 44 These parameters were kept the same for all the eight models used in this work. All the models were trained and tested using SHARCNET, a high-performance cluster in Canada. The dataset considered is an imbalanced dataset that comprises an equal number of normal and pneumonia CXR images and a diminutive number of COVID-19 images. The classification accuracy cannot be determined using the overall accuracy as we are detecting the COVID-19 cases which is a minority class in the dataset considered, whereas both normal and pneumonia belong to the majority class. The minority class is regarded as the positive class, and the majority classes both normal and pneumonia are grouped under negative class. So, the confusion matrix is calculated to estimate the true positives (TP) that determines the correctly identified positive class, and the false negatives (FN) represents the positive classes that are misclassified as negative class. The true negatives (TN) determine the correctly identified negative class and FP indicates the number of negative classes misclassified as positive class. Since we are interested in determining the accuracy of the COVID-19 compared with the overall accuracy, sensitivity, specificity, and receiver operating characteristic (ROC) curve were scrutinized for model evaluation. The automated diagnostic tool developed using the DenseNet201 model for the detection of COVID-19 evaluated using the test dataset yielded an overall accuracy of 92% with sensitivity of 94%. The model was also able to classify 764 normal and 709 pneumonia out of 800 cases. Considering COVID-19, it was able to classify 121 cases out of 129 cases. Only eight cases were misclassified. Most of the misclassification occurred between normal and pneumonia cases due to the early onset of pneumonia. The confusion matrix and the ROC curve for the proposed model are shown in Fig. 3 . The proposed model was able to achieve area under the curve (AUC) of 0.99, 0.97 and 0.97 for COVID-19, normal and pneumonia classes, respectively. We compared our method to seven state-of-the-art pretrained models. All the models were able to classify all the three classes (normal, pneumonia, and COVID-19) with a relatively high accuracy. Among 129 COVID-19 cases considered in the test dataset, all the models were able to classify more than 100 COVID-19 cases. Our proposed model based on DenseNet201 classified 121 cases bagging the first place, while ResNet50 was able to detect 117 cases holding the second place. When considering TN cases, VGG16, DenseNet121, DenseNet201, and ResNet101 were able to identify more than 700 cases correctly for both normal and pneumonia cases. VGG16 shows the least FP with two normal cases misidentified as COVID-19 and no FP for pneumonia cases. The least detection of TP was provided by VGG16 and highest FP was provided by the MobileNetV2 with least TN classification. The confusion matrix for all the seven pretrained models is summarized in Fig. 4 . The ROC curve for all the eight models used in this study is provided in Fig. 5 , and AUC for all the three classes (normal, pneumonia, and COVID-19) were calculated and provided in the ROC curve. When the AUC for COVID-19 is taken into consideration, VGG16, VGG19, DenseNet121, and proposed model achieved the highest value of 0.99, and the least value of 0.95 was achieved by MobileNetV2. When the normal and pneumonia are examined, DenseNet121 obtained 0.98 for both the cases. A summarized AUC for all the classes provided in Table 2 shows that the DenseNet121 achieved higher AUC for all the three classes with 0.99 for COVID-19, 0.98 for normal, and 0.98 for pneumonia, and MobileNetV2 showed a lower result compared with the other models with 0.95 for COVID-19, 0.96 for normal, and 0.97 for pneumonia. The evaluation metrics such as sensitivity, specificity, accuracy, and F1-score are summarized in Table 3 . Considering the overall classification accuracy, DenseNet201model was able to classify COVID-19 from normal and pneumonia with highest classification accuracy of 92% among all the other models considered. The DenseNet121 model was able to achieve the same classification accuracy of 92% as DenseNet201 model. VGG16, VGG19, ResNet101, and InceptionV3 obtained a classification accuracy of 90%, which is 2% less than the highest accuracy proposed. The sensitivity and specificity rate are some of the important metrics to be considered when the dataset is imbalanced. Based on sensitivity rate, proposed model achieved highest rate of 0.94 and specificity of 0.99. DenseNet121 with less deep model compared with DenseNet201 achieved the same specificity rate but a lower sensitivity rate of 0.89. ResNet50 was able to achieve 0.91 which is second to the proposed model. Though ResNet50 achieved a higher sensitivity rate when compared with ResNet101, we can imply that the network depth can improve the overall classification accuracy but a reduction in the sensitivity is observed. VGG16 and VGG19 show higher accuracy as they are large network and provide better performance on the larger dataset used in this work. The classification accuracy remains the same for both VGG16 and VGG19, whereas the sensitivity rate is higher for VGG19, and VGG16 has the least sensitivity rate in detecting COVID-19. F1-score is included in the metrics to provide a balance between the exactness and completeness of the model. Considering the F1-score achieved by each model, VGG19 achieved a higher score of 0.91 and DensNet201 acquired 0.90, following VGG19. An exceptionally low F1-score of 0.66 was achieved by the MobileNetV2 model. Also, to quantitatively justify, our proposed model was compared with a lightweight CNN designed from scratch. The lightweight CNN contains fewer parameters resulting in a faster CNN and provides a similar performance to the state-of-art pretrained models considered. The CNN architecture is made up of repeated blocks of convolution, batch normalization, and pooling as shown in Fig. 6 . The architecture consists of five convolution and pooling layers and a fully connected classification layer. The convolutional and the hidden layers are followed by the ReLU activation function. The convolutional layer comprises 64, 128, and 256 neurons with convolution kernel of size 3 × 3 with stride length of 1. The pooling layer comprises 2 × 2 kernel with stride length 2. Before convolution, the inputs are padded using "same" padding in Tensorflow. The results obtained by training and evaluating the model using the dataset are provided in Table 4 . The model was able to achieve higher sensitivity and specificity value with an overall classification accuracy of 89.7%. Using the transfer learning approach, our proposed model was able to outperform in terms of overall classification accuracy, sensitivity, specificity, and F1-score, providing a more accurate diagnosis. In this study, we have proposed a fully automated diagnostic tool for the classification of COVID-19 from normal and pneumonia cases. A large, imbalanced test dataset (at least 78% more than that of previous studies) 25, 28, 30 was used for evaluation of the model, and deeper CNN models were included as a benchmark analysis. 24 In comparison to the method proposed by Ozturk et al., 18 our model reported an accuracy and sensitivity of 92% and 94% (versus 87% and 85.53%), respectively, on a relatively larger dataset (eight times larger). Also, when the specificity and F1-score are considered, 7% and 3% increase were observed, respectively. Wang et al. 25 used a large dataset for model evaluation and achieved high performance for the classification. However, the test dataset may not reflect the actual clinical setting where small number of COVID-19 cases in the testing pool creates a huge class-imbalance. Compared with the sensitivity of 91% and accuracy of 93.3% reported, our model yielded a sensitivity and accuracy that is 3% and 0.7% higher, whereas the same class imbalance exists in the dataset is kept in the test dataset. Panwar et al. 26 reported an overall accuracy of 88% (4% less than our proposed model) based on a binary classification. Though their model achieved 97% in detecting COVID-19 which is 3% higher than our proposed model, their specificity rate is 78% (21% lower) which is an important metric to be considered in classification. Our model was able to achieve higher overall accuracy and specificity rate than those of previous methods. 26, 29 Asnaoui et al. 29 performed a comparative study of various pretrained models using threeclass classification. Since they used a fairly large dataset, looking into the results obtained by DenseNet201 model shows that an overall accuracy of 88.10%, sensitivity of 75.14%, and F1-score of 82.04% (4%, 19%, and 8% lower than our proposed model, respectively) were reported. Hemdan et al. 30 proposed a DCNN model using DenseNet201 and achieved an overall accuracy of 90% on a balanced test dataset. Based on the sensitivity rate, Hemdan et al. 30 and Narin et al. 28 reported a sensitivity rate of 1 but only 5 COVID-19 and 10 normal cases were included for test dataset, and no pneumonia cases were included for classification. In comparison, our dataset was 173 times larger and included an imbalanced test dataset. Since the CXR images were collected from various resources, a binary classification was performed using the developed model between normal and pneumonia cases obtained from the same source to eliminate the bias between the two classes (normal and pneumonia). We were able to achieve a classification accuracy of 95.75%, which shows that having data from different sources or the same sources does not influence the results, and our model is robust for images collected from different sources using different machines, but there is still a possibility for bias between COVID-19 and other classes decision boundary. Also, since we are dealing with x-ray images, they are more standardized compared with the other medical imaging modalities for detection of lung diseases. 45, 46 Our fully automated deep learning-based model based on DenseNet201 was able to detect COVID-19 with a higher accuracy based on a large and imbalanced dataset using CXR images. Considering the results obtained, the proposed model was able to outperform the other tested models in terms of overall accuracy, sensitivity, and specificity in identifying COVID-19 cases from normal and pneumonia cases. The analysis of the proposed model using higher number of pretrained models provided an insight on different model performance for medical image classification by implementing transfer learning approach using a dataset that is a large realistic representation of the patient population. Hence, the developed model utilizing CXR images will assist in the initial diagnosis of COVID-19 by the radiologists and will aid in subsequent analysis for clinical prognosis, including determining the severity of the disease affecting the lungs. No conflicts of interest, financial or otherwise, are declared by the authors. Fractional diffusion on the human proteome as an alternative to the multi-organ damage of SARS-CoV-2 Detection of SARS-CoV-2 in different types of clinical specimens Identifying pneumonia in chest x-rays: a deep learning approach Automated triaging of adult chest radiographs with deep artificial neural networks Deep learning-based detection for COVID-19 from chest CT using weak label A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT COVID-19 pneumonia diagnosis using a simple 2D deep learning framework with a single chest CT image: model development and validation High discordance of chest x-ray and CT for detection of pulmonary opacities in ED patients: implications for diagnosing pneumonia Portable chest x-ray in coronavirus disease-19 (COVID-19): a pictorial review Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network PAC-Bayesian framework-based drop-path method for 2D discriminative convolutional network pruning Using deep learning to enhance cancer diagnosis and classification Ensembles of deep learning architectures for the early diagnosis of the Alzheimer's disease Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection Accurate prediction of COVID-19 using chest x-ray images through deep feature learning model with SMOTE and machine learning classifiers Automatic detection of COVID-19 infection from chest x-ray using deep learning Automated detection of COVID-19 cases using deep neural networks with x-ray images Deep learning-based decision-tree classifier for COVID-19 diagnosis from chest x-ray imaging A deep learning approach to characterize 2019 coronavirus disease (COVID-19) pneumonia in chest CT images Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Detection of coronavirus disease (COVID-19) based on deep features Deep learning approaches for COVID-19 detection based on chest x-ray images COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest x-ray images Application of deep learning for fast detection of COVID-19 in x-rays using nCOVnet A deep learning approach to detect COVID-19 coronavirus with x-ray images Automatic detection of coronavirus disease (COVID-19) using x-ray images and deep convolutional neural networks Using x-ray images and deep learning for automated detection of coronavirus disease COVIDX-net: a framework of deep learning classifiers to diagnose COVID-19 in x-ray images Densely connected convolutional networks Very deep convolutional networks for large-scale image recognition Deep residual learning for image recognition MobileNetV2: inverted residuals and linear bottlenecks Rethinking the inception architecture for computer vision COVID-19 Image Data Collection COVID-19 Radiography Database Actualmed COVID-19 Chest X-ray Dataset Initiative Figure 1 COVID-19 Chest X-ray Dataset Initiative CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning Deep learning-based detection of COVID-19 from chest x-ray images Impact of lung segmentation on the diagnosis and explanation of COVID-19 in chest x-ray images Practical recommendations for gradient-based training of deep architectures Classification of lung diseases using deep learning models A survey of deep learning for lung disease detection on medical images: state-of-the-art, taxonomy, issues and future directions Ukwatta on image processing based on artificial intelligence-based methods. She received her bachelor's degree in electronics and communication engineering from Karunya University She received her PhD in electrical and computer engineering in 2020 from Carleton University, Canada. She is the recipient of Carleton Medal for her outstanding graduate work at the PhD level. She worked in the medical devices industry as an R&D engineer for 10 years His research focus is on the application of computer-based decision support tools for important human decision-making problems, including the improvement of human understanding of physiological data He is currently an assistant professor at the University of Guelph, Canada, and an adjunct professor in systems and computer engineering with Carleton University, Canada. He has more than 90 journal articles and conference proceedings. His research interests include medical image segmentation and registration The authors acknowledge the funding support from Natural Sciences and Engineering Council of Canada (NSERC) and COVID-19 seed grant by the University of Guelph.